Related papers: LLM-Guided Runtime Parameter Optimization for Ener…

Towards Sustainable NLP: Insights from Benchmarking Inference Energy in Large Language Models

Large language models (LLMs) are increasingly recognized for their exceptional generative capabilities and versatility across various tasks. However, the high inference costs associated with these models have not received adequate…

Computation and Language · Computer Science 2025-03-18 Soham Poddar , Paramita Koley , Janardan Misra , Sanjay Podder , Niloy Ganguly , Saptarshi Ghosh

Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models

Large Language Models (LLMs) have seen great advance in both academia and industry, and their popularity results in numerous open-source frameworks and techniques in accelerating LLM pre-training, fine-tuning, and inference. Training and…

Performance · Computer Science 2023-12-04 Longteng Zhang , Xiang Liu , Zeyu Li , Xinglin Pan , Peijie Dong , Ruibo Fan , Rui Guo , Xin Wang , Qiong Luo , Shaohuai Shi , Xiaowen Chu

Green Prompting: Characterizing Prompt-driven Energy Costs of LLM Inference

Large Language Models (LLMs) have become widely used across various domains spanning search engines, code generation, and text creation. However, a major concern associated with their adoption is the high cost of inference, impacting both…

Computation and Language · Computer Science 2026-04-28 Marta Adamska , Daria Smirnova , Hamid Nasiri , Zhengxin Yu , Peter Garraghan

Revisiting Prompt Engineering: A Comprehensive Evaluation for LLM-based Personalized Recommendation

Large language models (LLMs) can perform recommendation tasks by taking prompts written in natural language as input. Compared to traditional methods such as collaborative filtering, LLM-based recommendation offers advantages in handling…

Information Retrieval · Computer Science 2025-07-21 Genki Kusano , Kosuke Akimoto , Kunihiro Takeoka

Brevity is the soul of sustainability: Characterizing LLM response lengths

A significant portion of the energy consumed by Large Language Models (LLMs) arises from their inference processes; hence developing energy-efficient methods for inference is crucial. While several techniques exist for inference…

Computation and Language · Computer Science 2025-07-01 Soham Poddar , Paramita Koley , Janardan Misra , Sanjay Podder , Navveen Balani , Niloy Ganguly , Saptarshi Ghosh

Are Longer Prompts Always Better? Prompt Selection in Large Language Models for Recommendation Systems

In large language models (LLM)-based recommendation systems (LLM-RSs), accurately predicting user preferences by leveraging the general knowledge of LLMs is possible without requiring extensive training data. By converting recommendation…

Information Retrieval · Computer Science 2024-12-20 Genki Kusano , Kosuke Akimoto , Kunihiro Takeoka

Adaptively Robust LLM Inference Optimization under Prediction Uncertainty

We study the problem of optimizing Large Language Model (LLM) inference scheduling to minimize total latency. LLM inference is an online and multi-task service process and also heavily energy consuming by which a pre-trained LLM processes…

Machine Learning · Computer Science 2025-09-03 Zixi Chen , Yinyu Ye , Zijie Zhou

Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline

Large language models (LLMs) have revolutionized the field of AI, demonstrating unprecedented capacity across various tasks. However, the inference process for LLMs comes with significant computational costs. In this paper, we propose an…

Computation and Language · Computer Science 2023-05-30 Zangwei Zheng , Xiaozhe Ren , Fuzhao Xue , Yang Luo , Xin Jiang , Yang You

Energy Considerations of Large Language Model Inference and Efficiency Optimizations

As large language models (LLMs) scale in size and adoption, their computational and environmental costs continue to rise. Prior benchmarking efforts have primarily focused on latency reduction in idealized settings, often overlooking the…

Computation and Language · Computer Science 2025-04-25 Jared Fernandez , Clara Na , Vashisth Tiwari , Yonatan Bisk , Sasha Luccioni , Emma Strubell

Query-OPT: Optimizing Inference of Large Language Models via Multi-Query Instructions in Meeting Summarization

This work focuses on the task of query-based meeting summarization in which the summary of a context (meeting transcript) is generated in response to a specific query. When using Large Language Models (LLMs) for this task, usually a new…

Computation and Language · Computer Science 2024-10-22 Md Tahmid Rahman Laskar , Elena Khasanova , Xue-Yong Fu , Cheng Chen , Shashi Bhushan TN

Towards Goal-oriented Prompt Engineering for Large Language Models: A Survey

Large Language Models (LLMs) have shown prominent performance in various downstream tasks and prompt engineering plays a pivotal role in optimizing LLMs' performance. This paper, not only as an overview of current prompt engineering…

Computation and Language · Computer Science 2024-09-18 Haochen Li , Jonathan Leung , Zhiqi Shen

SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization

Large language models (LLMs) have been a disruptive innovation in recent years, and they play a crucial role in our daily lives due to their ability to understand and generate human-like text. Their capabilities include natural language…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-17 Akrit Mudvari , Yuang Jiang , Leandros Tassiulas

S2LPP: Small-to-Large Prompt Prediction across LLMs

The performance of pre-trained Large Language Models (LLMs) is often sensitive to nuances in prompt templates, requiring careful prompt engineering, adding costs in terms of computing and human effort. In this study, we present experiments…

Computation and Language · Computer Science 2025-05-27 Liang Cheng , Tianyi LI , Zhaowei Wang , Mark Steedman

Frugal Prompting for Dialog Models

The use of large language models (LLMs) in natural language processing (NLP) tasks is rapidly increasing, leading to changes in how researchers approach problems in the field. To fully utilize these models' abilities, a better understanding…

Computation and Language · Computer Science 2023-11-07 Bishal Santra , Sakya Basak , Abhinandan De , Manish Gupta , Pawan Goyal

LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

In long context scenarios, large language models (LLMs) face three main challenges: higher computational cost, performance reduction, and position bias. Research indicates that LLM performance hinges on the density and position of key…

Computation and Language · Computer Science 2024-08-13 Huiqiang Jiang , Qianhui Wu , Xufang Luo , Dongsheng Li , Chin-Yew Lin , Yuqing Yang , Lili Qiu

Batch Prompting: Efficient Inference with Large Language Model APIs

Performing inference on large volumes of samples with large language models (LLMs) can be computationally and financially costly in industry and real-world use. We propose batch prompting, a simple yet effective prompting approach that…

Computation and Language · Computer Science 2023-10-25 Zhoujun Cheng , Jungo Kasai , Tao Yu

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks. However, their substantial computational and memory requirements present challenges, especially for devices…

Computation and Language · Computer Science 2024-08-01 Keivan Alizadeh , Iman Mirzadeh , Dmitry Belenko , Karen Khatamifard , Minsik Cho , Carlo C Del Mundo , Mohammad Rastegari , Mehrdad Farajtabar

LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models

Large language models (LLMs) have been applied in various applications due to their astonishing capabilities. With advancements in technologies such as chain-of-thought (CoT) prompting and in-context learning (ICL), the prompts fed to LLMs…

Computation and Language · Computer Science 2023-12-07 Huiqiang Jiang , Qianhui Wu , Chin-Yew Lin , Yuqing Yang , Lili Qiu

Offline Energy-Optimal LLM Serving: Workload-Based Energy Models for LLM Inference on Heterogeneous Systems

The rapid adoption of large language models (LLMs) has led to significant advances in natural language processing and text generation. However, the energy consumed through LLM model inference remains a major challenge for sustainable AI…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-08 Grant Wilkins , Srinivasan Keshav , Richard Mortier

GREEN-CODE: Learning to Optimize Energy Efficiency in LLM-based Code Generation

Large Language Models (LLMs) are becoming integral to daily life, showcasing their vast potential across various Natural Language Processing (NLP) tasks. Beyond NLP, LLMs are increasingly used in software development tasks, such as code…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-24 Shashikant Ilager , Lukas Florian Briem , Ivona Brandic