Related papers: Approximate Cluster-Based Sparse Document Retrieva…

Faster Learned Sparse Retrieval with Block-Max Pruning

Learned sparse retrieval systems aim to combine the effectiveness of contextualized language models with the scalability of conventional data structures such as inverted indexes. Nevertheless, the indexes generated by these systems exhibit…

Information Retrieval · Computer Science 2024-05-03 Antonio Mallia , Torten Suel , Nicola Tonellotto

Efficiency Optimizations for Superblock-based Sparse Retrieval

Learned sparse retrieval (LSR) is a popular method for first-stage retrieval because it combines the semantic matching of language models with efficient CPU-friendly algorithms. Previous work aggregates blocks into "superblocks" to quickly…

Information Retrieval · Computer Science 2026-02-04 Parker Carlson , Wentai Xie , Rohil Shah , Tao Yang

Dynamic Superblock Pruning for Fast Learned Sparse Retrieval

This paper proposes superblock pruning (SP) during top-k online document retrieval for learned sparse representations. SP structures the sparse index as a set of superblocks on a sequence of document blocks and conducts a superblock-level…

Information Retrieval · Computer Science 2026-02-04 Parker Carlson , Wentai Xie , Shanxiu He , Tao Yang

LSTM-based Selective Dense Text Retrieval Guided by Sparse Lexical Retrieval

This paper studies fast fusion of dense retrieval and sparse lexical retrieval, and proposes a cluster-based selective dense retrieval method called CluSD guided by sparse lexical retrieval. CluSD takes a lightweight cluster-based approach…

Information Retrieval · Computer Science 2025-02-18 Yingrui Yang , Parker Carlson , Yifan Qiao , Wentai Xie , Shanxiu He , Tao Yang

A Clustering Approach to Learn Sparsely-Used Overcomplete Dictionaries

We consider the problem of learning overcomplete dictionaries in the context of sparse coding, where each sample selects a sparse subset of dictionary elements. Our main result is a strategy to approximately recover the unknown dictionary…

Machine Learning · Statistics 2014-07-08 Alekh Agarwal , Animashree Anandkumar , Praneeth Netrapalli

Dual Skipping Guidance for Document Retrieval with Learned Sparse Representations

This paper proposes a dual skipping guidance scheme with hybrid scoring to accelerate document retrieval that uses learned sparse representations while still delivering a good relevance. This scheme uses both lexical BM25 and learned neural…

Information Retrieval · Computer Science 2022-04-26 Yifan Qiao , Yingrui Yang , Haixin Lin , Tianbo Xiong , Xiyue Wang , Tao Yang

A Polynomial Algorithm for Balanced Clustering via Graph Partitioning

The objective of clustering is to discover natural groups in datasets and to identify geometrical structures which might reside there, without assuming any prior knowledge on the characteristics of the data. The problem can be seen as…

Computational Geometry · Computer Science 2018-01-26 Luis-Evaristo Caraballo , José-Miguel Díaz-Báñez , Nadine Kroher

Density-based Clustering with Best-scored Random Forest

Single-level density-based approach has long been widely acknowledged to be a conceptually and mathematically convincing clustering method. In this paper, we propose an algorithm called "best-scored clustering forest" that can obtain the…

Machine Learning · Statistics 2019-06-25 Hanyuan Hang , Yuchao Cai , Hanfang Yang

Tree Index: A New Cluster Evaluation Technique

We introduce a cluster evaluation technique called Tree Index. Our Tree Index algorithm aims at describing the structural information of the clustering rather than the quantitative format of cluster-quality indexes (where the representation…

Machine Learning · Computer Science 2020-03-25 A. H. Beg , Md Zahidul Islam , Vladimir Estivill-Castro

Hybrid Inverted Index Is a Robust Accelerator for Dense Retrieval

Inverted file structure is a common technique for accelerating dense retrieval. It clusters documents based on their embeddings; during searching, it probes nearby clusters w.r.t. an input query and only evaluates documents within them by…

Information Retrieval · Computer Science 2023-10-18 Peitian Zhang , Zheng Liu , Shitao Xiao , Zhicheng Dou , Jing Yao

Faster Exact Search using Document Clustering

We show how full-text search based on inverted indices can be accelerated by clustering the documents without losing results (SeCluD -- SEarch with CLUstered Documents). We develop a fast multilevel clustering algorithm that explicitly uses…

Information Retrieval · Computer Science 2014-11-06 Jonathan Dimond , Peter Sanders

Subspace Segmentation by Successive Approximations: A Method for Low-Rank and High-Rank Data with Missing Entries

We propose a method to reconstruct and cluster incomplete high-dimensional data lying in a union of low-dimensional subspaces. Exploring the sparse representation model, we jointly estimate the missing data while imposing the intrinsic…

Computer Vision and Pattern Recognition · Computer Science 2017-09-06 João Carvalho , Manuel Marques , João P. Costeira

A probabilistic constrained clustering for transfer learning and image category discovery

Neural network-based clustering has recently gained popularity, and in particular a constrained clustering formulation has been proposed to perform transfer learning and image category discovery using deep learning. The core idea is to…

Computer Vision and Pattern Recognition · Computer Science 2018-06-29 Yen-Chang Hsu , Zhaoyang Lv , Joel Schlosser , Phillip Odom , Zsolt Kira

A Static Pruning Study on Sparse Neural Retrievers

Sparse neural retrievers, such as DeepImpact, uniCOIL and SPLADE, have been introduced recently as an efficient and effective way to perform retrieval with inverted indexes. They aim to learn term importance and, in some cases, document…

Information Retrieval · Computer Science 2023-04-26 Carlos Lassance , Simon Lupart , Hervé Dejean , Stéphane Clinchant , Nicola Tonellotto

Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations

Learned sparse representations form an attractive class of contextual embeddings for text retrieval. That is so because they are effective models of relevance and are interpretable by design. Despite their apparent compatibility with…

Information Retrieval · Computer Science 2024-07-15 Sebastian Bruch , Franco Maria Nardini , Cosimo Rulli , Rossano Venturini

SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking

In neural Information Retrieval, ongoing research is directed towards improving the first retriever in ranking pipelines. Learning dense embeddings to conduct retrieval using efficient approximate nearest neighbors methods has proven to…

Information Retrieval · Computer Science 2021-07-14 Thibault Formal , Benjamin Piwowarski , Stéphane Clinchant

An Analytical Approach to Document Clustering Based on Internal Criterion Function

Fast and high quality document clustering is an important task in organizing information, search engine results obtaining from user query, enhancing web crawling and information retrieval. With the large amount of data available and with a…

Information Retrieval · Computer Science 2010-03-11 Alok Ranjan , Harish Verma , Eatesh Kandpal , Joydip Dhar

Clustering-based Low Rank Approximation Method

We propose a clustering-based generalized low rank approximation method, which takes advantage of appealing features from both the generalized low rank approximation of matrices (GLRAM) and cluster analysis. It exploits a more general form…

Optimization and Control · Mathematics 2025-02-21 Yujun Zhu , Jie Zhu , Hizba Arshad , Zhongming Wang , Ju Ming

Learning to Score: Tuning Cluster Schedulers through Reinforcement Learning

Efficiently allocating incoming jobs to nodes in large-scale clusters can lead to substantial improvements in both cluster utilization and job performance. In order to allocate incoming jobs, cluster schedulers usually rely on a set of…

Machine Learning · Computer Science 2026-03-12 Martin Asenov , Qiwen Deng , Gingfung Yeung , Adam Barker

Improving Image Clustering through Sample Ranking and Its Application to remote--sensing images

Image clustering is a very useful technique that is widely applied to various areas, including remote sensing. Recently, visual representations by self-supervised learning have greatly improved the performance of image clustering. To…

Computer Vision and Pattern Recognition · Computer Science 2022-09-27 Qinglin Li , Guoping Qiu