Related papers: Evolving Subnetwork Training for Large Language Mo…

ESLM: Risk-Averse Selective Language Modeling for Efficient Pretraining

Large language model pretraining is compute-intensive, yet many tokens contribute marginally to learning, resulting in inefficiency. We introduce Efficient Selective Language Modeling (ESLM), a risk-aware algorithm that improves training…

Machine Learning · Computer Science 2025-05-27 Melis Ilayda Bal , Volkan Cevher , Michael Muehlebach

Pruning Large Language Models with Semi-Structural Adaptive Sparse Training

The remarkable success of Large Language Models (LLMs) relies heavily on their substantial scale, which poses significant challenges during model deployment in terms of latency and memory consumption. Recently, numerous studies have…

Computation and Language · Computer Science 2024-12-19 Weiyu Huang , Yuezhou Hu , Guohao Jian , Jun Zhu , Jianfei Chen

Tuning Large language model for End-to-end Speech Translation

With the emergence of large language models (LLMs), multimodal models based on LLMs have demonstrated significant potential. Models such as LLaSM, X-LLM, and SpeechGPT exhibit an impressive ability to comprehend and generate human…

Computation and Language · Computer Science 2023-10-04 Hao Zhang , Nianwen Si , Yaqi Chen , Wenlin Zhang , Xukui Yang , Dan Qu , Xiaolin Jiao

An Emulator for Fine-Tuning Large Language Models using Small Language Models

Widely used language models (LMs) are typically built by scaling up a two-stage training pipeline: a pre-training stage that uses a very large, diverse dataset of text and a fine-tuning (sometimes, 'alignment') stage that uses targeted…

Computation and Language · Computer Science 2023-10-20 Eric Mitchell , Rafael Rafailov , Archit Sharma , Chelsea Finn , Christopher D. Manning

Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning

Fine-tuning large language models (LLMs) for downstream tasks is an essential stage of modern AI deployment. Reinforcement learning (RL) has emerged as the dominant fine-tuning paradigm, underpinning many state-of-the-art LLMs. In contrast,…

Machine Learning · Computer Science 2026-02-10 Xin Qiu , Yulu Gan , Conor F. Hayes , Qiyao Liang , Yinggan Xu , Roberto Dailey , Elliot Meyerson , Babak Hodjat , Risto Miikkulainen

Efficient Stitchable Task Adaptation

The paradigm of pre-training and fine-tuning has laid the foundation for deploying deep learning models. However, most fine-tuning methods are designed to meet a specific resource budget. Recently, considering diverse deployment scenarios…

Machine Learning · Computer Science 2024-07-10 Haoyu He , Zizheng Pan , Jing Liu , Jianfei Cai , Bohan Zhuang

EstLLM: Enhancing Estonian Capabilities in Multilingual LLMs via Continued Pretraining and Post-Training

Large language models (LLMs) are predominantly trained on English-centric data, resulting in uneven performance for smaller languages. We study whether continued pretraining (CPT) can substantially improve Estonian capabilities in a…

Computation and Language · Computer Science 2026-03-03 Aleksei Dorkin , Taido Purason , Emil Kalbaliyev , Hele-Andra Kuulmets , Marii Ojastu , Mark Fišel , Tanel Alumäe , Eleri Aedmaa , Krister Kruusmaa , Kairit Sirts

Large Language Models are Good Attackers: Efficient and Stealthy Textual Backdoor Attacks

With the burgeoning advancements in the field of natural language processing (NLP), the demand for training data has increased significantly. To save costs, it has become common for users and businesses to outsource the labor-intensive task…

Computation and Language · Computer Science 2024-08-22 Ziqiang Li , Yueqi Zeng , Pengfei Xia , Lei Liu , Zhangjie Fu , Bin Li

EA4LLM: A Gradient-Free Approach to Large Language Model Optimization via Evolutionary Algorithms

In recent years, large language models (LLMs) have made remarkable progress, with model optimization primarily relying on gradient-based optimizers such as Adam. However, these gradient-based methods impose stringent hardware requirements,…

Artificial Intelligence · Computer Science 2025-10-24 WenTao Liu , Siyu Song , Hao Hao , Aimin Zhou

Large Language Models for Tuning Evolution Strategies

Large Language Models (LLMs) exhibit world knowledge and inference capabilities, making them powerful tools for various applications. This paper proposes a feedback loop mechanism that leverages these capabilities to tune Evolution…

Machine Learning · Computer Science 2024-05-21 Oliver Kramer

Efficient Pre-Training with Token Superposition

Pre-training of Large Language Models is often prohibitively expensive and inefficient at scale, requiring complex and invasive modifications in order to achieve high data throughput. In this work, we present Token-Superposition Training…

Computation and Language · Computer Science 2026-05-20 Bowen Peng , Théo Gigant , Jeffrey Quesnelle

EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language Models

Large Language Models (LLMs), with their increasing depth and number of parameters, have demonstrated outstanding performance across a variety of natural language processing tasks. However, this growth in scale leads to increased…

Computation and Language · Computer Science 2025-10-28 Hossein Rajabzadeh , Aref Jafari , Aman Sharma , Benyamin Jami , Hyock Ju Kwon , Ali Ghodsi , Boxing Chen , Mehdi Rezagholizadeh

Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference

Large language models (LLMs) have revolutionized natural language processing (NLP) by excelling at understanding and generating human-like text. However, their widespread deployment can be prohibitively expensive. SortedNet is a recent…

Computation and Language · Computer Science 2024-02-12 Parsa Kavehzadeh , Mojtaba Valipour , Marzieh Tahaei , Ali Ghodsi , Boxing Chen , Mehdi Rezagholizadeh

CEM: A Data-Efficient Method for Large Language Models to Continue Evolving From Mistakes

As world knowledge advances and new task schemas emerge, Continual Learning (CL) becomes essential for keeping Large Language Models (LLMs) current and addressing their shortcomings. This process typically involves continual instruction…

Machine Learning · Computer Science 2024-12-17 Haokun Zhao , Haixia Han , Jie Shi , Chengyu Du , Jiaqing Liang , Yanghua Xiao

EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models

This work introduces EE-Tuning, a lightweight and economical solution to training/tuning early-exit large language models (LLMs). In contrast to the common approach of full-parameter pre-training, EE-Tuning augments any pre-trained (and…

Machine Learning · Computer Science 2024-02-02 Xuchen Pan , Yanxi Chen , Yaliang Li , Bolin Ding , Jingren Zhou

STEP: Staged Parameter-Efficient Pre-training for Large Language Models

Pre-training large language models (LLMs) faces significant memory challenges due to the large size of model parameters. We introduce STaged parameter-Efficient Pre-training (STEP), which integrates parameter-efficient tuning techniques…

Computation and Language · Computer Science 2025-04-08 Kazuki Yano , Takumi Ito , Jun Suzuki

eP-ALM: Efficient Perceptual Augmentation of Language Models

Large Language Models (LLMs) have so far impressed the world, with unprecedented capabilities that emerge in models at large scales. On the vision side, transformer models (i.e., ViT) are following the same trend, achieving the best…

Computer Vision and Pattern Recognition · Computer Science 2023-10-30 Mustafa Shukor , Corentin Dancette , Matthieu Cord

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources. Existing long-context extension methods usually need additional training procedures to support…

Computation and Language · Computer Science 2024-02-23 Jiaheng Liu , Zhiqi Bai , Yuanxing Zhang , Chenchen Zhang , Yu Zhang , Ge Zhang , Jiakai Wang , Haoran Que , Yukang Chen , Wenbo Su , Tiezheng Ge , Jie Fu , Wenhu Chen , Bo Zheng

EOE: Evolutionary Optimization of Experts for Training Language Models

This paper presents an evolutionary framework for the training of large language models(LLM). The models are divided into several experts(sub-networks), which have the same structure but different parameter values. Only one expert is…

Machine Learning · Computer Science 2025-09-30 Yingshi Chen

A Novel Paradigm Boosting Translation Capabilities of Large Language Models

This paper presents a study on strategies to enhance the translation capabilities of large language models (LLMs) in the context of machine translation (MT) tasks. The paper proposes a novel paradigm consisting of three stages: Secondary…

Computation and Language · Computer Science 2024-04-16 Jiaxin Guo , Hao Yang , Zongyao Li , Daimeng Wei , Hengchao Shang , Xiaoyu Chen