Related papers: An Efficiency Study for SPLADE Models

An Alternative to FLOPS Regularization to Effectively Productionize SPLADE-Doc

Learned Sparse Retrieval (LSR) models encode text as weighted term vectors, which need to be sparse to leverage inverted index structures during retrieval. SPLADE, the most popular LSR model, uses FLOPS regularization to encourage vector…

Information Retrieval · Computer Science 2025-05-22 Aldo Porco , Dhruv Mehra , Igor Malioutov , Karthik Radhakrishnan , Moniba Keymanesh , Daniel Preoţiuc-Pietro , Sean MacAvaney , Pengxiang Cheng

SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval

In neural Information Retrieval (IR), ongoing research is directed towards improving the first retriever in ranking pipelines. Learning dense embeddings to conduct retrieval using efficient approximate nearest neighbors methods has proven…

Information Retrieval · Computer Science 2021-09-22 Thibault Formal , Carlos Lassance , Benjamin Piwowarski , Stéphane Clinchant

SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking

In neural Information Retrieval, ongoing research is directed towards improving the first retriever in ranking pipelines. Learning dense embeddings to conduct retrieval using efficient approximate nearest neighbors methods has proven to…

Information Retrieval · Computer Science 2021-07-14 Thibault Formal , Benjamin Piwowarski , Stéphane Clinchant

Two-Step SPLADE: Simple, Efficient and Effective Approximation of SPLADE

Learned sparse models such as SPLADE have successfully shown how to incorporate the benefits of state-of-the-art neural information retrieval models into the classical inverted index data structure. Despite their improvements in…

Information Retrieval · Computer Science 2024-04-23 Carlos Lassance , Hervé Dejean , Stéphane Clinchant , Nicola Tonellotto

COST-EFF: Collaborative Optimization of Spatial and Temporal Efficiency with Slenderized Multi-exit Language Models

Transformer-based pre-trained language models (PLMs) mostly suffer from excessive overhead despite their advanced capacity. For resource-constrained devices, there is an urgent need for a spatially and temporally efficient model which…

Computation and Language · Computer Science 2022-10-28 Bowen Shen , Zheng Lin , Yuanxin Liu , Zhengxiao Liu , Lei Wang , Weiping Wang

On the Challenges and Opportunities of Learned Sparse Retrieval for Code

Retrieval over large codebases is a key component of modern LLM-based software engineering systems. Existing approaches predominantly rely on dense embedding models, while learned sparse retrieval (LSR) remains largely unexplored for code.…

Information Retrieval · Computer Science 2026-03-24 Simon Lupart , Maxime Louis , Thibault Formal , Hervé Déjean , Stéphane Clinchant

MAPLE-Edge: A Runtime Latency Predictor for Edge Devices

Neural Architecture Search (NAS) has enabled automatic discovery of more efficient neural network architectures, especially for mobile and embedded vision applications. Although recent research has proposed ways of quickly estimating…

Machine Learning · Computer Science 2022-04-28 Saeejith Nair , Saad Abbasi , Alexander Wong , Mohammad Javad Shafiee

The Role of Vocabularies in Learning Sparse Representations for Ranking

Learned Sparse Retrieval (LSR) such as SPLADE has growing interest for effective semantic 1st stage matching while enjoying the efficiency of inverted indices. A recent work on learning SPLADE models with expanded vocabularies (ESPLADE) was…

Information Retrieval · Computer Science 2026-04-21 Hiun Kim , Tae Kwan Lee , Taeryun Won

Efficiency and Effectiveness of SPLADE Models on Billion-Scale Web Document Title

This paper presents a comprehensive comparison of BM25, SPLADE, and Expanded-SPLADE models in the context of large-scale web document retrieval. We evaluate the effectiveness and efficiency of these models on datasets spanning from tens of…

Information Retrieval · Computer Science 2025-12-01 Taeryun Won , Tae Kwan Lee , Hiun Kim , Hyemin Lee

LLM Optimization Unlocks Real-Time Pairwise Reranking

Efficiently reranking documents retrieved from information retrieval (IR) pipelines to enhance overall quality of Retrieval-Augmented Generation (RAG) system remains an important yet challenging problem. Recent studies have highlighted the…

Computation and Language · Computer Science 2025-11-12 Jingyu Wu , Aditya Shrivastava , Jing Zhu , Alfy Samuel , Anoop Kumar , Daben Liu

SPaCe: Unlocking Sample-Efficient Large Language Models Training With Self-Pace Curriculum Learning

Large language models (LLMs) have shown strong reasoning capabilities when fine-tuned with reinforcement learning (RL). However, such methods require extensive data and compute, making them impractical under many realistic training budgets.…

Machine Learning · Computer Science 2026-04-17 Dai Do , Manh Nguyen , Svetha Venkatesh , Hung Le

Effective Inference-Free Retrieval for Learned Sparse Representations

Learned Sparse Retrieval (LSR) is an effective IR approach that exploits pre-trained language models for encoding text into a learned bag of words. Several efforts in the literature have shown that sparsity is key to enabling a good…

Information Retrieval · Computer Science 2025-05-06 Franco Maria Nardini , Thong Nguyen , Cosimo Rulli , Rossano Venturini , Andrew Yates

CLONE: Customizing LLMs for Efficient Latency-Aware Inference at the Edge

Deploying large language models (LLMs) on edge devices is crucial for delivering fast responses and ensuring data privacy. However, the limited storage, weight, and power of edge devices make it difficult to deploy LLM-powered applications.…

Hardware Architecture · Computer Science 2025-06-04 Chunlin Tian , Xinpeng Qin , Kahou Tam , Li Li , Zijian Wang , Yuanzhe Zhao , Minglei Zhang , Chengzhong Xu

Prompt Compression in the Wild: Measuring Latency, Rate Adherence, and Quality for Faster LLM Inference

With the wide adoption of language models for IR -- and specifically RAG systems -- the latency of the underlying LLM becomes a crucial bottleneck, since the long contexts of retrieved passages lead large prompts and therefore, compute…

Information Retrieval · Computer Science 2026-04-06 Cornelius Kummer , Lena Jurkschat , Michael Färber , Sahar Vahdati

SPARe: Stacked Parallelism with Adaptive Reordering for Fault-Tolerant LLM Pretraining Systems with 100k+ GPUs

In large-scale LLM pre-training systems with 100k+ GPUs, failures become the norm rather than the exception, and restart costs can dominate wall-clock training time. However, existing fault-tolerance mechanisms are largely unprepared for…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Jin Lee , Zhonghao Chen , Xuhang He , Robert Underwood , Bogdan Nicolae , Franck Cappello , Xiaoyi Lu , Sheng Di , Zheng Zhang

SLICE: SLO-Driven Scheduling for LLM Inference on Edge Computing Devices

Large Language Models (LLMs), as the foundational architecture for next-generation interactive AI applications, not only power intelligent dialogue systems but also drive the evolution of embodied intelligence on edge devices, including…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-19 Will Chow

Exploring the Representation Power of SPLADE Models

The SPLADE (SParse Lexical AnD Expansion) model is a highly effective approach to learned sparse retrieval, where documents are represented by term impact scores derived from large language models. During training, SPLADE applies…

Information Retrieval · Computer Science 2023-06-30 Joel Mackenzie , Shengyao Zhuang , Guido Zuccon

Achieving Peak Performance for Large Language Models: A Systematic Review

In recent years, large language models (LLMs) have achieved remarkable success in natural language processing (NLP). LLMs require an extreme amount of parameters to attain high performance. As models grow into the trillion-parameter range,…

Computation and Language · Computer Science 2024-09-10 Zhyar Rzgar K Rostam , Sándor Szénási , Gábor Kertész

What Objective Does Self-paced Learning Indeed Optimize?

Self-paced learning (SPL) is a recently raised methodology designed through simulating the learning principle of humans/animals. A variety of SPL realization schemes have been designed for different computer vision and pattern recognition…

Machine Learning · Computer Science 2016-11-02 Deyu Meng , Qian Zhao , Lu Jiang

Supercharging Packet-level Network Simulation of Large Model Training via Memoization and Fast-Forwarding

Packet-level discrete-event simulation (PLDES) is a prevalent tool for evaluating detailed performance of large model training. Although PLDES offers high fidelity and generality, its slow performance has plagued networking practitioners.…

Networking and Internet Architecture · Computer Science 2026-02-12 Fei Long , Kaihui Gao , Li Chen , Dan Li , Yiwei Zhang , Fei Gui , Yitao Xing , Wenjia Wei , Bingyang Liu