English
Related papers

Related papers: Token-Efficient Leverage Learning in Large Languag…

200 papers

Large language models (LLMs) achieve remarkable advancements by leveraging tools to interact with environments, a critical step toward generalized AI. However, the standard supervised fine-tuning (SFT) approach, which relies on large-scale…

Computation and Language · Computer Science 2025-08-27 Junjie Ye , Yilong Wu , Sixian Li , Yuming Yang , Zhiheng Xi , Tao Gui , Qi Zhang , Xuanjing Huang , Peng Wang , Zhongchao Shi , Jianping Fan , Zhengyin Du

Token filtering has been proposed to enhance the utility of large language models (LLMs) by eliminating inconsequential tokens during training. While usingfewer tokens is expected to reduce computational workloads, existing methods have not…

Machine Learning · Computer Science 2026-03-20 Di Chai , Pengbo Li , Feiyuan Zhang , Yilun Jin , Han Tian , Kaiqiang Xu , Binhang Yuan , Dian Shen , Junxue Zhang , Kai Chen

Large Language Models (LLMs) excel in high-resource languages but struggle with low-resource languages due to limited training data. This paper presents TALL (Trainable Architecture for Enhancing LLM Performance in Low-Resource Languages),…

Computation and Language · Computer Science 2025-06-06 Moshe Ofer , Orel Zamler , Amos Azaria

Despite Multi-modal Large Language Models (MM-LLMs) have made exciting strides recently, they are still struggling to efficiently model the interactions among multi-modal inputs and the generation in non-textual modalities. In this work, we…

Computation and Language · Computer Science 2024-01-05 Zhen Yang , Yingxue Zhang , Fandong Meng , Jie Zhou

Large Language Models (LLMs) typically come with a fixed architecture, despite growing evidence that not all layers contribute equally to every downstream task. We introduce TALE (Task-Aware Layer Elimination), an inference-time method that…

Machine Learning · Computer Science 2026-05-12 Omar Naim , Krish Sharma , Niyar R Barman , Nicholas Asher

The computational cost of training multimodal large language models (MLLMs) grows rapidly with the number of processed tokens. Existing efficiency methods mainly target inference via token reduction or merging, offering limited benefits…

Computer Vision and Pattern Recognition · Computer Science 2026-04-30 Chaoyu Li , Yogesh Kulkarni , Pooyan Fazli

Model ensemble is a useful approach in reinforcement learning (RL) for training effective agents. Despite wide success of RL, training effective agents remains difficult due to the multitude of factors requiring careful tuning, such as…

Machine Learning · Computer Science 2025-05-22 Yiwen Song , Qianyue Hao , Qingmin Liao , Jian Yuan , Yong Li

Optimizing training performance in large language models (LLMs) remains an essential challenge, particularly in improving model performance while maintaining computational costs. This work challenges the conventional approach of training…

Computation and Language · Computer Science 2025-11-04 Chun-Hao Yang , Bo-Han Feng , Tzu-Yuan Lai , Yan Yu Chen , Yin-Kai Dean Huang , Shou-De Lin

Large Language Models (LLMs) exhibit impressive zero/few-shot inference and generation quality for high-resource languages (HRLs). A few of them have been trained on low-resource languages (LRLs) and give decent performance. Owing to the…

Computation and Language · Computer Science 2024-04-22 Arijit Nag , Animesh Mukherjee , Niloy Ganguly , Soumen Chakrabarti

Recent large language models (LLMs) have demonstrated exceptional performance on general-purpose text embedding tasks. While dense embeddings have dominated related research, we introduce the first lexicon-based embeddings (LENS) leveraging…

Computation and Language · Computer Science 2026-03-20 Yibin Lei , Tao Shen , Yu Cao , Andrew Yates

Reinforcement learning (RL) is a promising approach for robotic manipulation, but it can suffer from low sample efficiency and requires extensive exploration of large state-action spaces. Recent methods leverage the commonsense knowledge…

Robotics · Computer Science 2026-04-15 Jelle Luijkx , Runyu Ma , Zlatan Ajanović , Jens Kober

Large language model pretraining is compute-intensive, yet many tokens contribute marginally to learning, resulting in inefficiency. We introduce Efficient Selective Language Modeling (ESLM), a risk-aware algorithm that improves training…

Machine Learning · Computer Science 2025-05-27 Melis Ilayda Bal , Volkan Cevher , Michael Muehlebach

To meet the requirements of real-world applications, it is essential to control generations of large language models (LLMs). Prior research has tried to introduce reinforcement learning (RL) into controllable text generation while most…

Computation and Language · Computer Science 2024-03-19 Wendi Li , Wei Wei , Kaihe Xu , Wenfeng Xie , Dangyang Chen , Yu Cheng

Multimodal Large Language Models (MLLMs) struggle with continual learning, often suffering from catastrophic forgetting when adapting to sequential tasks. We introduce a routing-based architecture that integrates new capabilities while…

Machine Learning · Computer Science 2026-04-08 Jay Mohta , Kenan Emir Ak , Gwang Lee , Dimitrios Dimitriadis , Yan Xu , Mingwei Shen

Large language models suffer issues when operated on long contexts that are larger than their training context length due to the standard position encoding for tokens in the attention layer. Tokens a long distance apart will rarely have an…

Computation and Language · Computer Science 2025-05-26 Phat Thanh Dang , Saahil Thoppay , Wang Yang , Qifan Wang , Vipin Chaudhary , Xiaotian Han

Most vision-language models (VLMs) apply a large language model (LLM) as the decoder, where the response tokens are generated sequentially through autoregression. Therefore, the number of output tokens can be the bottleneck of the…

Computer Vision and Pattern Recognition · Computer Science 2026-04-07 Sixun Dong , Juhua Hu , Steven Li , Wei Wen , Qi Qian

Large language models (LLMs) are known for their exceptional performance across a range of natural language processing tasks, but their deployment comes at a high computational and financial cost. On the other hand, smaller language models…

Computation and Language · Computer Science 2024-09-24 Adarsh MS , Jithin VG , Ditto PS

Large Language Models (LLMs) are increasingly applied to data-intensive workflows, from database querying to developer observability. Yet the effectiveness of these systems is constrained by the volume, verbosity, and noise of real-world…

Software Engineering · Computer Science 2025-10-15 Marcus Emmanuel Barnes , Taher A. Ghaleb , Safwat Hassan

Multimodal Large Language Models (MLLMs) have demonstrated exceptional success in various multimodal tasks, yet their deployment is frequently limited by substantial computational demands and prolonged inference times. Given that the vision…

Computer Vision and Pattern Recognition · Computer Science 2025-10-01 Zihui Zhao , Yingxin Li , Yang Li

Recent studies have uncovered the potential of Large Language Models (LLMs) in addressing complex sequential decision-making tasks through the provision of high-level instructions. However, LLM-based agents lack specialization in tackling…

Artificial Intelligence · Computer Science 2024-05-28 Zihao Zhou , Bin Hu , Chenyang Zhao , Pu Zhang , Bin Liu
‹ Prev 1 2 3 10 Next ›