English
Related papers

Related papers: Efficient Sequence Packing without Cross-contamina…

200 papers

Large pre-trained sentence encoders like BERT start a new chapter in natural language processing. A common practice to apply pre-trained BERT to sequence classification tasks (e.g., classification of sentences or sentence pairs) is by…

Computation and Language · Computer Science 2020-02-26 Wenxuan Zhou , Junyi Du , Xiang Ren

In recent years, Large Language Models (LLMs) have made significant strides towards Artificial General Intelligence. However, training these models from scratch requires substantial computational resources and vast amounts of text data. In…

Computation and Language · Computer Science 2024-10-03 Wenzhen Zheng , Wenbo Pan , Xu Xu , Libo Qin , Li Yue , Ming Zhou

Optimizing training performance in large language models (LLMs) remains an essential challenge, particularly in improving model performance while maintaining computational costs. This work challenges the conventional approach of training…

Computation and Language · Computer Science 2025-11-04 Chun-Hao Yang , Bo-Han Feng , Tzu-Yuan Lai , Yan Yu Chen , Yin-Kai Dean Huang , Shou-De Lin

Large language models (LLMs) have revolutionized the field of AI, demonstrating unprecedented capacity across various tasks. However, the inference process for LLMs comes with significant computational costs. In this paper, we propose an…

Computation and Language · Computer Science 2023-05-30 Zangwei Zheng , Xiaozhe Ren , Fuzhao Xue , Yang Luo , Xin Jiang , Yang You

Effective attention modules have played a crucial role in the success of Transformer-based large language models (LLMs), but the quadratic time and memory complexities of these attention modules also pose a challenge when processing long…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-07 Ao Sun , Weilin Zhao , Xu Han , Cheng Yang , Zhiyuan Liu , Chuan Shi , Maosong Sun

This paper focuses on extending the success of large language models (LLMs) to sequential decision making. Existing efforts either (i) re-train or finetune LLMs for decision making, or (ii) design prompts for pretrained LLMs. The former…

Machine Learning · Computer Science 2025-06-17 Dingyang Chen , Qi Zhang , Yinglun Zhu

Continual pre-training has demonstrated significant potential in enhancing model performance, particularly in domain-specific scenarios. The most common approach for packing data before continual pre-training involves concatenating input…

Computation and Language · Computer Science 2025-05-30 Ruicheng Yin , Xuan Gao , Changze Lv , Xiaohua Wang , Xiaoqing Zheng , Xuanjing Huang

Although large language models (LLMs) have advanced the state-of-the-art in NLP significantly, deploying them for downstream applications is still challenging due to cost, responsiveness, control, or concerns around privacy and security. As…

Computation and Language · Computer Science 2023-11-01 Dong-Ho Lee , Jay Pujara , Mohit Sewak , Ryen W. White , Sujay Kumar Jauhar

Word embeddings are a powerful approach for analyzing language and have been widely popular in numerous tasks in information retrieval and text mining. Training embeddings over huge corpora is computationally expensive because the input is…

Machine Learning · Computer Science 2018-12-11 Avishek Anand , Megha Khosla , Jaspreet Singh , Jan-Hendrik Zab , Zijian Zhang

Pre-trained language models (PrLM) have to carefully manage input units when training on a very large text with a vocabulary consisting of millions of words. Previous works have shown that incorporating span-level information over…

Computation and Language · Computer Science 2021-09-16 Rongzhou Bao , Zhuosheng Zhang , Hai Zhao

The pre-training of masked language models (MLMs) consumes massive computation to achieve good results on downstream NLP tasks, resulting in a large carbon footprint. In the vanilla MLM, the virtual tokens, [MASK]s, act as placeholders and…

Computation and Language · Computer Science 2022-11-16 Baohao Liao , David Thulke , Sanjika Hewavitharana , Hermann Ney , Christof Monz

Padding is often used in tuning LLM models by adding special tokens to shorter training examples to match the length of the longest sequence in each batch. While this ensures uniformity for batch processing, it introduces inefficiencies by…

Machine Learning · Computer Science 2024-09-04 Achintya Kundu , Rhui Dih Lee , Laura Wynter , Raghu Kiran Ganti , Mayank Mishra

Transformer-based models, specifically BERT, have propelled research in various NLP tasks. However, these models are limited to a maximum token limit of 512 tokens. Consequently, this makes it non-trivial to apply it in a practical setting…

Computation and Language · Computer Science 2023-11-01 Aman Jaiswal , Evangelos Milios

While Large Language Models (LLMs) demonstrate strong performance across domains, their long-context capabilities are limited by transient neural activations causing information decay and unstructured feed-forward network (FFN) weights…

Neurons and Cognition · Quantitative Biology 2026-04-13 Kangcong Li , Peng Ye , Chongjun Tu , Lin Zhang , Chunfeng Song , Jiamin Wu , Tao Yang , Qihao Zheng , Tao Chen

Autoregressive decoding in large language models (LLMs) requires $\mathcal{O}(n)$ sequential steps for $n$ tokens, fundamentally limiting inference throughput. Recent diffusion-based LLMs (dLLMs) enable parallel token generation through…

Computation and Language · Computer Science 2025-10-06 Wenrui Bao , Zhiben Chen , Dan Xu , Yuzhang Shang

While Large Language Models (LLMs) have achieved remarkable success in various fields, the efficiency of training and inference remains a major challenge. To address this issue, we propose SUBLLM, short for Subsampling-Upsampling-Bypass…

Computation and Language · Computer Science 2024-08-26 Quandong Wang , Yuxuan Yuan , Xiaoyu Yang , Ruike Zhang , Kang Zhao , Wei Liu , Jian Luan , Daniel Povey , Bin Wang

Transfer learning with large pretrained transformer-based language models like BERT has become a dominating approach for most NLP tasks. Simply fine-tuning those large language models on downstream tasks or combining it with task-specific…

Computation and Language · Computer Science 2021-08-06 Wenjuan Han , Bo Pang , Yingnian Wu

In recent years, large language models (LLMs) have achieved remarkable success in natural language processing (NLP). LLMs require an extreme amount of parameters to attain high performance. As models grow into the trillion-parameter range,…

Computation and Language · Computer Science 2024-09-10 Zhyar Rzgar K Rostam , Sándor Szénási , Gábor Kertész

Training large language models (LLMs) from scratch can yield models with unique functionalities and strengths, but it is costly and often leads to redundant capabilities. A more cost-effective alternative is to fuse existing pre-trained…

Computation and Language · Computer Science 2025-09-23 Runjia Zeng , James Chenhao Liang , Cheng Han , Zhiwen Cao , Jiahao Liu , Xiaojun Quan , Yingjie Victor Chen , Lifu Huang , Tong Geng , Qifan Wang , Dongfang Liu

Large language models (LLMs) have been applied in various applications due to their astonishing capabilities. With advancements in technologies such as chain-of-thought (CoT) prompting and in-context learning (ICL), the prompts fed to LLMs…

Computation and Language · Computer Science 2023-12-07 Huiqiang Jiang , Qianhui Wu , Chin-Yew Lin , Yuqing Yang , Lili Qiu