English
Related papers

Related papers: Efficient Sequence Packing without Cross-contamina…

200 papers

Recent breakthroughs and successful deployment of large language and vision models in a constrained environment predominantly follow a two phase approach. First, large models are trained to achieve peak performance, followed by a model…

Machine Learning · Computer Science 2024-11-22 Hanna Mazzawi , Pranjal Awasthi , Xavi Gonzalvo , Srikumar Ramalingam

Masked Language Modeling (MLM) is widely used to pretrain language models. The standard random masking strategy in MLM causes the pre-trained language models (PLMs) to be biased toward high-frequency tokens. Representation learning of rare…

Computation and Language · Computer Science 2023-05-25 Linhan Zhang , Qian Chen , Wen Wang , Chong Deng , Xin Cao , Kongzhang Hao , Yuxin Jiang , Wei Wang

While long-context large language models (LLMs) exhibit remarkable document processing capabilities, their prohibitively high training costs often hinder customized applications. To mitigate this issue, we propose \textit{Sequential…

Machine Learning · Computer Science 2025-05-23 Wenhao Li , Yuxin Zhang , Gen Luo , Daohai Yu , Rongrong Ji

Recent state-of-the-art language models utilize a two-phase training procedure comprised of (i) unsupervised pre-training on unlabeled text, and (ii) fine-tuning for a specific supervised task. More recently, many studies have been focused…

Computation and Language · Computer Science 2019-11-15 Itzik Malkiel , Lior Wolf

Large language models (LLMs) have recently garnered significant interest. With in-context learning, LLMs achieve impressive results in various natural language tasks. However, the application of LLMs to sentence embeddings remains an area…

Computation and Language · Computer Science 2023-08-01 Ting Jiang , Shaohan Huang , Zhongzhi Luan , Deqing Wang , Fuzhen Zhuang

Very deep CNNs with small 3x3 kernels have recently been shown to achieve very strong performance as acoustic models in hybrid NN-HMM speech recognition systems. In this paper we investigate how to efficiently scale these models to larger…

Computation and Language · Computer Science 2016-06-28 Tom Sercu , Vaibhava Goel

Transformers achieve unrivalled performance in modelling language, but remain inefficient in terms of memory and time complexity. A possible remedy is to reduce the sequence length in the intermediate layers by pooling fixed-length segments…

Computation and Language · Computer Science 2023-10-25 Piotr Nawrot , Jan Chorowski , Adrian Łańcucki , Edoardo M. Ponti

Recent advancements in Large Language Models (LLMs)-based text embedding models primarily focus on data scaling or synthesis, yet limited exploration of training techniques and data quality, thereby constraining performance. In this work,…

Large language models (LLMs) have been widely employed across various application domains, yet their black-box nature poses significant challenges to understanding how these models process input data internally to make predictions. In this…

Machine Learning · Computer Science 2025-09-03 Hangfeng He , Weijie J. Su

Multimodal large language models (MLLMs) have shown remarkable performance for cross-modal understanding and generation, yet still suffer from severe inference costs. Recently, abundant works have been proposed to solve this problem with…

Computation and Language · Computer Science 2025-05-30 Zichen Wen , Yifeng Gao , Weijia Li , Conghui He , Linfeng Zhang

When solving NLP tasks with limited labelled data, researchers typically either use a general large language model without further update, or use a small number of labelled samples to tune a specialised smaller model. In this work, we…

Computation and Language · Computer Science 2026-01-26 Branislav Pecher , Ivan Srba , Maria Bielikova

Recent advancements in large language models (LLMs) have remarkably enhanced performances on a variety of tasks in multiple languages. However, tokenizers in LLMs trained primarily on English-centric corpora often overly fragment a text…

Computation and Language · Computer Science 2024-08-07 Jimin Hong , Gibbeum Lee , Jaewoong Cho

We observe that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences -- from arbitrary ones procedurally generated by probabilistic context-free grammars (PCFG), to more rich spatial…

Artificial Intelligence · Computer Science 2023-10-27 Suvir Mirchandani , Fei Xia , Pete Florence , Brian Ichter , Danny Driess , Montserrat Gonzalez Arenas , Kanishka Rao , Dorsa Sadigh , Andy Zeng

Pretraining large language models (LLMs) with next-token prediction has led to remarkable advances, yet the context-dependent nature of token embeddings in such models results in high intra-class variance and inter-class similarity, thus…

Computation and Language · Computer Science 2026-05-12 Yan Sun , Guoxia Wang , Jinle Zeng , JiaBin Yang , Shuai Li , Li Shen , Dacheng Tao , DianHai Yu , Haifeng Wang

We introduce Predictive Batch Scheduling (PBS), a novel training optimization technique that accelerates language model convergence by dynamically prioritizing high-loss samples during batch construction. Unlike curriculum learning…

Artificial Intelligence · Computer Science 2026-02-20 Sumedh Rasal

Large language models (LLMs) are increasingly used for topic modeling, outperforming classical topic models such as LDA. Commonly, pre-trained LLM encoders such as BERT are used out-of-the-box despite the fact that fine-tuning is known to…

Computation and Language · Computer Science 2026-02-23 Johannes Schneider

Large pre-training language models (PLMs) have shown promising in-context learning abilities. However, due to the backbone transformer architecture, existing PLMs are bottlenecked by the memory and computational cost when scaling up to a…

Computation and Language · Computer Science 2023-02-13 Mukai Li , Shansan Gong , Jiangtao Feng , Yiheng Xu , Jun Zhang , Zhiyong Wu , Lingpeng Kong

The advent of Large Multimodal Models (LMMs) has significantly enhanced Large Language Models (LLMs) to process and interpret diverse data modalities (e.g., image and video). However, as input complexity increases, particularly with long…

Computer Vision and Pattern Recognition · Computer Science 2025-12-23 Shilin Yan , Jiaming Han , Joey Tsai , Hongwei Xue , Rongyao Fang , Lingyi Hong , Ziyu Guo , Ray Zhang

We present paired learning and inference algorithms for significantly reducing computation and increasing speed of the vector dot products in the classifiers that are at the heart of many NLP components. This is accomplished by partitioning…

Computation and Language · Computer Science 2014-12-23 Emma Strubell , Luke Vilnis , Andrew McCallum

We introduce a scaling law for fine-tuning large language models (LLMs) under fixed compute budgets that explicitly accounts for data composition. Conventional approaches measure training data solely by total tokens, yet the number of…

Computation and Language · Computer Science 2025-06-04 Ryan Lagasse , Aidan Kierans , Avijit Ghosh , Shiri Dori-Hacohen