English
Related papers

Related papers: Efficient Sequence Packing without Cross-contamina…

200 papers

Large Language Models (LLMs) struggle with long-context reasoning, not only due to the quadratic scaling of computational complexity with sequence length but also because of the scarcity and expense of annotating long-context data. There…

Computation and Language · Computer Science 2025-04-18 Linda He , Jue Wang , Maurice Weber , Shang Zhu , Ben Athiwaratkun , Ce Zhang

Extracting sentence embeddings from large language models (LLMs) is a promising direction, as LLMs have demonstrated stronger semantic understanding capabilities. Previous studies typically focus on prompt engineering to elicit sentence…

Computation and Language · Computer Science 2025-07-04 Yuchen Fu , Zifeng Cheng , Zhiwei Jiang , Zhonghui Wang , Yafeng Yin , Zhengliang Li , Qing Gu

Recent work exploring the capabilities of pre-trained large language models (LLMs) has demonstrated their ability to act as general pattern machines by completing complex token sequences representing a wide array of tasks, including…

Computers and Society · Computer Science 2024-03-25 Seyed Parsa Neshaei , Richard Lee Davis , Adam Hazimeh , Bojan Lazarevski , Pierre Dillenbourg , Tanja Käser

Recent advancements in data-to-text generation largely take on the form of neural end-to-end systems. Efforts have been dedicated to improving text generation systems by changing the order of training samples in a process known as…

Computation and Language · Computer Science 2021-02-09 Ernie Chang , Hui-Syuan Yeh , Vera Demberg

Multimodal large language models (MLLMs) have recently demonstrated strong capabilities in understanding and generating responses from diverse visual inputs, including high-resolution images and long video sequences. As these models scale…

Computer Vision and Pattern Recognition · Computer Science 2026-04-21 Junwan Kim , Hyunkyung Bae

Recently, large language models (LLMs) have shown remarkable capabilities including understanding context, engaging in logical reasoning, and generating responses. However, this is achieved at the expense of stringent computational and…

Computation and Language · Computer Science 2024-05-30 Xindi Wang , Mahsa Salmani , Parsa Omidi , Xiangyu Ren , Mehdi Rezagholizadeh , Armaghan Eshaghi

Large language models have drastically changed the prospects of AI by introducing technologies for more complex natural language processing. However, current methodologies to train such LLMs require extensive resources including but not…

Computation and Language · Computer Science 2026-04-27 Noel Elias , Homa Esfahanizadeh , Kaan Kale , Sriram Vishwanath , Muriel Medard

Large Language Models (LLMs) have seen great advance in both academia and industry, and their popularity results in numerous open-source frameworks and techniques in accelerating LLM pre-training, fine-tuning, and inference. Training and…

Performance · Computer Science 2023-12-04 Longteng Zhang , Xiang Liu , Zeyu Li , Xinglin Pan , Peijie Dong , Ruibo Fan , Rui Guo , Xin Wang , Qiong Luo , Shaohuai Shi , Xiaowen Chu

Pre-training of Large Language Models is often prohibitively expensive and inefficient at scale, requiring complex and invasive modifications in order to achieve high data throughput. In this work, we present Token-Superposition Training…

Computation and Language · Computer Science 2026-05-20 Bowen Peng , Théo Gigant , Jeffrey Quesnelle

Large Language Models (LLMs) have ushered in a new wave of artificial intelligence advancements impacting every scientific field and discipline. We live in a world where most of the data around us, e.g., text, audio, and music, has a…

Signal Processing · Electrical Eng. & Systems 2025-02-11 Prateek Verma

Large Language Models (LLMs) typically generate outputs token by token using a fixed compute budget, leading to inefficient resource utilization. To address this shortcoming, recent advancements in mixture of expert (MoE) models,…

Unsupervised Machine Learning techniques have been applied to Natural Language Processing tasks and surpasses the benchmarks such as GLUE with great success. Building language models approach achieves good results in one language and it can…

Computation and Language · Computer Science 2022-11-28 Amir Jafari

Large Language Models (LLMs) excel at reasoning and planning when trained on chainof-thought (CoT) data, where the step-by-step thought process is explicitly outlined by text tokens. However, this results in lengthy inputs where many words…

Computation and Language · Computer Science 2025-09-03 DiJia Su , Hanlin Zhu , Yingchen Xu , Jiantao Jiao , Yuandong Tian , Qinqing Zheng

Large language models such as GPT and Llama are trained with a next-token prediction loss. In this work, we suggest that training language models to predict multiple future tokens at once results in higher sample efficiency. More…

Computation and Language · Computer Science 2026-03-03 Athul Radhakrishnan , Siddhant Mohan , Mahima Sachdeva

Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks. However, the truthfulness of their outputs is not guaranteed, and their tendency toward overconfidence further limits reliability. Uncertainty…

Computation and Language · Computer Science 2026-03-23 Qi Cao , Andrew Gambardella , Takeshi Kojima , Yutaka Matsuo , Yusuke Iwasawa

While sequence-to-sequence (seq2seq) models achieve state-of-the-art performance in many natural language processing tasks, they can be too slow for real-time applications. One performance bottleneck is predicting the most likely next token…

Computation and Language · Computer Science 2019-07-26 Chunyang Xiao , Christoph Teichmann , Konstantine Arkoudas

With the widespread application of Large Language Models (LLMs) in the field of Natural Language Processing (NLP), enhancing their performance has become a research hotspot. This paper presents a novel multi-prompt ensemble decoding…

Computation and Language · Computer Science 2024-12-25 Jiaxin Guo , Daimeng Wei , Yuanchang Luo , Shimin Tao , Hengchao Shang , Zongyao Li , Shaojun Li , Jinlong Yang , Zhanglin Wu , Zhiqiang Rao , Hao Yang

Large language models (LLMs) have achieved remarkable performance across a wide range of tasks, but their increasing parameter sizes significantly slow down inference. Speculative decoding mitigates this issue by leveraging a smaller draft…

Computation and Language · Computer Science 2026-05-27 Kuan-Wei Lu , Ding-Yong Hong , Pangfeng Liu , Jan-Jan Wu

Packing, initially utilized in the pre-training phase, is an optimization technique designed to maximize hardware resource efficiency by combining different training sequences to fit the model's maximum input length. Although it has…

Machine Learning · Computer Science 2024-11-07 Shuhe Wang , Guoyin Wang , Yizhong Wang , Jiwei Li , Eduard Hovy , Chen Guo

Large language models (LLMs) have shown promising efficacy across various tasks, becoming powerful tools in numerous aspects of human life. However, Transformer-based LLMs suffer a performance degradation when modeling long-term contexts…

Computation and Language · Computer Science 2026-03-23 Weiyao Luo , Suncong Zheng , Heming Xia , Weikang Wang , Yan Lei , Tianyu Liu , Shuang Chen , Zhifang Sui