Related papers: Training for Fast Sequential Prediction Using Dyna…

Learning Dynamic Feature Selection for Fast Sequential Prediction

We present paired learning and inference algorithms for significantly reducing computation and increasing speed of the vector dot products in the classifiers that are at the heart of many NLP components. This is accomplished by partitioning…

Computation and Language · Computer Science 2015-05-25 Emma Strubell , Luke Vilnis , Kate Silverstein , Andrew McCallum

Faster Neural Network Training with Approximate Tensor Operations

We propose a novel technique for faster deep neural network training which systematically applies sample-based approximation to the constituent tensor operations, i.e., matrix multiplications and convolutions. We introduce new sampling…

Machine Learning · Computer Science 2021-10-27 Menachem Adelman , Kfir Y. Levy , Ido Hakimi , Mark Silberstein

Enhancing Deep Neural Network Training Efficiency and Performance through Linear Prediction

Deep neural networks (DNN) have achieved remarkable success in various fields, including computer vision and natural language processing. However, training an effective DNN model still poses challenges. This paper aims to propose a method…

Machine Learning · Computer Science 2024-07-03 Hejie Ying , Mengmeng Song , Yaohong Tang , Shungen Xiao , Zimin Xiao

Searching for Discriminative Words in Multidimensional Continuous Feature Space

Word feature vectors have been proven to improve many NLP tasks. With recent advances in unsupervised learning of these feature vectors, it became possible to train it with much more data, which also resulted in better quality of learned…

Computation and Language · Computer Science 2022-11-29 Marius Sajgalik , Michal Barla , Maria Bielikova

Embedding Lexical Features via Low-Rank Tensors

Modern NLP models rely heavily on engineered features, which often combine word and contextual information into complex lexical features. Such combination results in large numbers of features, which can lead to over-fitting. We present a…

Computation and Language · Computer Science 2016-04-05 Mo Yu , Mark Dredze , Raman Arora , Matthew Gormley

Efficient Sequence Packing without Cross-contamination: Accelerating Large Language Models without Impacting Performance

Effective training of today's large language models (LLMs) depends on large batches and long sequences for throughput and accuracy. To handle variable-length sequences on hardware accelerators, it is common practice to introduce padding…

Computation and Language · Computer Science 2022-10-07 Mario Michael Krell , Matej Kosec , Sergio P. Perez , Andrew Fitzgibbon

A Sequential Model for Multi-Class Classification

Many classification problems require decisions among a large number of competing classes. These tasks, however, are not handled well by general purpose learning methods and are usually addressed in an ad-hoc fashion. We suggest a general…

Artificial Intelligence · Computer Science 2007-05-23 Yair Even-Zohar , Dan Roth

D4: Improving LLM Pretraining via Document De-Duplication and Diversification

Over recent years, an increasing amount of compute and data has been poured into training large language models (LLMs), usually by doing one-pass learning on as many tokens as possible randomly selected from large-scale web corpora. While…

Computation and Language · Computer Science 2023-08-24 Kushal Tirumala , Daniel Simig , Armen Aghajanyan , Ari S. Morcos

Towards Improving Selective Prediction Ability of NLP Systems

It's better to say "I can't answer" than to answer incorrectly. This selective prediction ability is crucial for NLP systems to be reliably deployed in real-world applications. Prior work has shown that existing selective prediction…

Computation and Language · Computer Science 2022-04-08 Neeraj Varshney , Swaroop Mishra , Chitta Baral

SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator

Large Language Models (LLMs) have exhibited exceptional performance across a spectrum of natural language processing tasks. However, their substantial sizes pose considerable challenges, particularly in computational demands and inference…

Computation and Language · Computer Science 2025-06-03 Guoxuan Chen , Han Shi , Jiawei Li , Yihang Gao , Xiaozhe Ren , Yimeng Chen , Xin Jiang , Zhenguo Li , Weiyang Liu , Chao Huang

An efficient framework for learning sentence representations

In this work we propose a simple and efficient framework for learning sentence representations from unlabelled data. Drawing inspiration from the distributional hypothesis and recent work on learning sentence representations, we reformulate…

Computation and Language · Computer Science 2018-03-09 Lajanugen Logeswaran , Honglak Lee

Keyphrase Extraction using Sequential Labeling

Keyphrases efficiently summarize a document's content and are used in various document processing and retrieval tasks. Several unsupervised techniques and classifiers exist for extracting keyphrases from text documents. Most of these…

Computation and Language · Computer Science 2016-08-04 Sujatha Das Gollapalli , Xiao-li Li

A Topological Improvement of the Overall Performance of Sparse Evolutionary Training: Motif-Based Structural Optimization of Sparse MLPs Project

Deep Neural Networks (DNNs) have been proven to be exceptionally effective and have been applied across diverse domains within deep learning. However, as DNN models increase in complexity, the demand for reduced computational costs and…

Neural and Evolutionary Computing · Computer Science 2025-06-12 Xiaotian Chen , Hongyun Liu , Seyed Sahand Mohammadi Ziabari

A Linear Dynamical System Model for Text

Low dimensional representations of words allow accurate NLP models to be trained on limited annotated data. While most representations ignore words' local context, a natural way to induce context-dependent representations is to perform…

Machine Learning · Statistics 2015-06-02 David Belanger , Sham Kakade

Multirate Training of Neural Networks

We propose multirate training of neural networks: partitioning neural network parameters into "fast" and "slow" parts which are trained on different time scales, where slow parts are updated less frequently. By choosing appropriate…

Machine Learning · Computer Science 2022-11-02 Tiffany Vlaar , Benedict Leimkuhler

Deep Networks With Large Output Spaces

Deep neural networks have been extremely successful at various image, speech, video recognition tasks because of their ability to model deep structures within the data. However, they are still prohibitively expensive to train and apply for…

Neural and Evolutionary Computing · Computer Science 2015-04-13 Sudheendra Vijayanarasimhan , Jonathon Shlens , Rajat Monga , Jay Yagnik

Deep Natural Language Feature Learning for Interpretable Prediction

We propose a general method to break down a main complex task into a set of intermediary easier sub-tasks, which are formulated in natural language as binary questions related to the final target task. Our method allows for representing…

Computation and Language · Computer Science 2024-02-02 Felipe Urrutia , Cristian Buc , Valentin Barriere

Learning Mechanism Underlying NLP Pre-Training and Fine-Tuning

Natural language processing (NLP) enables the understanding and generation of meaningful human language, typically using a pre-trained complex architecture on a large dataset to learn the language and next fine-tune its weights to implement…

Computation and Language · Computer Science 2025-09-04 Yarden Tzach , Ronit D. Gross , Ella Koresh , Shalom Rosner , Or Shpringer , Tal Halevi , Ido Kanter

PETapter: Leveraging PET-style classification heads for modular few-shot parameter-efficient fine-tuning

Few-shot learning and parameter-efficient fine-tuning (PEFT) are crucial to overcome the challenges of data scarcity and ever growing language model sizes. This applies in particular to specialized scientific domains, where researchers…

Computation and Language · Computer Science 2025-09-18 Jonas Rieger , Mattes Ruckdeschel , Gregor Wiedemann

Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference

Large language models (LLMs) have revolutionized natural language processing (NLP) by excelling at understanding and generating human-like text. However, their widespread deployment can be prohibitively expensive. SortedNet is a recent…

Computation and Language · Computer Science 2024-02-12 Parsa Kavehzadeh , Mojtaba Valipour , Marzieh Tahaei , Ali Ghodsi , Boxing Chen , Mehdi Rezagholizadeh