Related papers: Token-Efficient Leverage Learning in Large Languag…

TL-Training: A Task-Feature-Based Framework for Training Large Language Models in Tool Use

Large language models (LLMs) achieve remarkable advancements by leveraging tools to interact with environments, a critical step toward generalized AI. However, the standard supervised fine-tuning (SFT) approach, which relies on large-scale…

Computation and Language · Computer Science 2025-08-27 Junjie Ye , Yilong Wu , Sixian Li , Yuming Yang , Zhiheng Xi , Tao Gui , Qi Zhang , Xuanjing Huang , Peng Wang , Zhongchao Shi , Jianping Fan , Zhengyin Du

Unlocking Full Efficiency of Token Filtering in Large Language Model Training

Token filtering has been proposed to enhance the utility of large language models (LLMs) by eliminating inconsequential tokens during training. While usingfewer tokens is expected to reduce computational workloads, existing methods have not…

Machine Learning · Computer Science 2026-03-20 Di Chai , Pengbo Li , Feiyuan Zhang , Yilun Jin , Han Tian , Kaiqiang Xu , Binhang Yuan , Dian Shen , Junxue Zhang , Kai Chen

TALL -- A Trainable Architecture for Enhancing LLM Performance in Low-Resource Languages

Large Language Models (LLMs) excel in high-resource languages but struggle with low-resource languages due to limited training data. This paper presents TALL (Trainable Architecture for Enhancing LLM Performance in Low-Resource Languages),…

Computation and Language · Computer Science 2025-06-06 Moshe Ofer , Orel Zamler , Amos Azaria

TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models

Despite Multi-modal Large Language Models (MM-LLMs) have made exciting strides recently, they are still struggling to efficiently model the interactions among multi-modal inputs and the generation in non-textual modalities. In this work, we…

Computation and Language · Computer Science 2024-01-05 Zhen Yang , Yingxue Zhang , Fandong Meng , Jie Zhou

TELL-TALE: Task Efficient LLMs with Task Aware Layer Elimination

Large Language Models (LLMs) typically come with a fixed architecture, despite growing evidence that not all layers contribute equally to every downstream task. We introduce TALE (Task-Aware Layer Elimination), an inference-time method that…

Machine Learning · Computer Science 2026-05-12 Omar Naim , Krish Sharma , Niyar R Barman , Nicholas Asher

ReGATE: Learning Faster and Better with Fewer Tokens in MLLMs

The computational cost of training multimodal large language models (MLLMs) grows rapidly with the number of processed tokens. Existing efficiency methods mainly target inference via token reduction or merging, offering limited benefits…

Computer Vision and Pattern Recognition · Computer Science 2026-04-30 Chaoyu Li , Yogesh Kulkarni , Pooyan Fazli

Multiple Weaks Win Single Strong: Large Language Models Ensemble Weak Reinforcement Learning Agents into a Supreme One

Model ensemble is a useful approach in reinforcement learning (RL) for training effective agents. Despite wide success of RL, training effective agents remains difficult due to the multitude of factors requiring careful tuning, such as…

Machine Learning · Computer Science 2025-05-22 Yiwen Song , Qianyue Hao , Qingmin Liao , Jian Yuan , Yong Li

Training LLMs Beyond Next Token Prediction -- Filling the Mutual Information Gap

Optimizing training performance in large language models (LLMs) remains an essential challenge, particularly in improving model performance while maintaining computational costs. This work challenges the conventional approach of training…

Computation and Language · Computer Science 2025-11-04 Chun-Hao Yang , Bo-Han Feng , Tzu-Yuan Lai , Yan Yu Chen , Yin-Kai Dean Huang , Shou-De Lin

Cost-Performance Optimization for Processing Low-Resource Language Tasks Using Commercial LLMs

Large Language Models (LLMs) exhibit impressive zero/few-shot inference and generation quality for high-resource languages (HRLs). A few of them have been trained on low-resource languages (LRLs) and give decent performance. Owing to the…

Computation and Language · Computer Science 2024-04-22 Arijit Nag , Animesh Mukherjee , Niloy Ganguly , Soumen Chakrabarti

Enhancing Lexicon-Based Text Embeddings with Large Language Models

Recent large language models (LLMs) have demonstrated exceptional performance on general-purpose text embedding tasks. While dense embeddings have dominated related research, we introduce the first lexicon-based embeddings (LENS) leveraging…

Computation and Language · Computer Science 2026-03-20 Yibin Lei , Tao Shen , Yu Cao , Andrew Yates

LLM-Guided Task- and Affordance-Level Exploration in Reinforcement Learning

Reinforcement learning (RL) is a promising approach for robotic manipulation, but it can suffer from low sample efficiency and requires extensive exploration of large state-action spaces. Recent methods leverage the commonsense knowledge…

Robotics · Computer Science 2026-04-15 Jelle Luijkx , Runyu Ma , Zlatan Ajanović , Jens Kober

ESLM: Risk-Averse Selective Language Modeling for Efficient Pretraining

Large language model pretraining is compute-intensive, yet many tokens contribute marginally to learning, resulting in inefficiency. We introduce Efficient Selective Language Modeling (ESLM), a risk-aware algorithm that improves training…

Machine Learning · Computer Science 2025-05-27 Melis Ilayda Bal , Volkan Cevher , Michael Muehlebach

Reinforcement Learning with Token-level Feedback for Controllable Text Generation

To meet the requirements of real-world applications, it is essential to control generations of large language models (LLMs). Prior research has tried to introduce reinforcement learning (RL) into controllable text generation while most…

Computation and Language · Computer Science 2024-03-19 Wendi Li , Wei Wei , Kaihe Xu , Wenfeng Xie , Dangyang Chen , Yu Cheng

Routing-Based Continual Learning for Multimodal Large Language Models

Multimodal Large Language Models (MLLMs) struggle with continual learning, often suffering from catastrophic forgetting when adapting to sequential tasks. We introduce a routing-based architecture that integrates new capabilities while…

Machine Learning · Computer Science 2026-04-08 Jay Mohta , Kenan Emir Ak , Gwang Lee , Dimitrios Dimitriadis , Yan Xu , Mingwei Shen

SELF: Self-Extend the Context Length With Logistic Growth Function

Large language models suffer issues when operated on long contexts that are larger than their training context length due to the standard position encoding for tokens in the attention layer. Tokens a long distance apart will rarely have an…

Computation and Language · Computer Science 2025-05-26 Phat Thanh Dang , Saahil Thoppay , Wang Yang , Qifan Wang , Vipin Chaudhary , Xiaotian Han

Rethinking Model Efficiency: Multi-Agent Inference with Large Models

Most vision-language models (VLMs) apply a large language model (LLM) as the decoder, where the response tokens are generated sequentially through autoregression. Therefore, the number of output tokens can be the bottleneck of the…

Computer Vision and Pattern Recognition · Computer Science 2026-04-07 Sixun Dong , Juhua Hu , Steven Li , Wei Wen , Qi Qian

Efficient Hybrid Inference for LLMs: Reward-Based Token Modelling with Selective Cloud Assistance

Large language models (LLMs) are known for their exceptional performance across a range of natural language processing tasks, but their deployment comes at a high computational and financial cost. On the other hand, smaller language models…

Computation and Language · Computer Science 2024-09-24 Adarsh MS , Jithin VG , Ditto PS

Task-Aware Reduction for Scalable LLM-Database Systems

Large Language Models (LLMs) are increasingly applied to data-intensive workflows, from database querying to developer observability. Yet the effectiveness of these systems is constrained by the volume, verbosity, and noise of real-world…

Software Engineering · Computer Science 2025-10-15 Marcus Emmanuel Barnes , Taher A. Ghaleb , Safwat Hassan

LFTR: Learning-Free Token Reduction for Multimodal Large Language Models

Multimodal Large Language Models (MLLMs) have demonstrated exceptional success in various multimodal tasks, yet their deployment is frequently limited by substantial computational demands and prolonged inference times. Given that the vision…

Computer Vision and Pattern Recognition · Computer Science 2025-10-01 Zihui Zhao , Yingxin Li , Yang Li

Large Language Model as a Policy Teacher for Training Reinforcement Learning Agents

Recent studies have uncovered the potential of Large Language Models (LLMs) in addressing complex sequential decision-making tasks through the provision of high-level instructions. However, LLM-based agents lack specialization in tackling…

Artificial Intelligence · Computer Science 2024-05-28 Zihao Zhou , Bin Hu , Chenyang Zhao , Pu Zhang , Bin Liu