Related papers: Boosting Search Performance Using Query Variations
Large-scale e-commerce search must surface a broad set of items from a vast catalog, ranging from bestselling products to new, trending, or seasonal items. Modern systems therefore rely on multiple specialized retrieval channels to surface…
The goal of rank fusion in information retrieval (IR) is to deliver a single output list from multiple search results. Improving performance by combining the outputs of various IR systems is a challenging task. A central point is the fact…
The vast increase in amount and complexity of digital content led to a wide interest in ad-hoc retrieval systems in recent years. Complementary, the existence of heterogeneous data sources and retrieval models stimulated the proliferation…
Retrieval-Augmented Generation (RAG) systems commonly adopt retrieval fusion techniques such as multi-query retrieval and reciprocal rank fusion (RRF) to increase document recall, under the assumption that higher recall leads to better…
Social networks have ensured the expanding disproportion between the face of WWW stored traditionally in search engine repositories and the actual ever changing face of Web. Exponential growth of web users and the ease with which they can…
Common document ranking pipelines in search systems are cascade systems that involve multiple ranking layers to integrate different information step-by-step. In this paper, we propose a novel re-ranker Fusion-in-T5 (FiT5), which integrates…
Recommender systems (RS) are pivotal in managing information overload in modern digital services. A key challenge in RS is efficiently processing vast item pools to deliver highly personalized recommendations under strict latency…
This paper presents a robust and comprehensive graph-based rank aggregation approach, used to combine results of isolated ranker models in retrieval tasks. The method follows an unsupervised scheme, which is independent of how the isolated…
Combining the results of different search engines in order to improve upon their performance has been the subject of many research papers. This has become known as the "Data Fusion" task, and has great promise in dealing with the vast…
The 2025 TREC Interactive Knowledge Assistance Track (iKAT) featured both interactive and offline submission tasks. The former requires systems to operate under real-time constraints, making robustness and efficiency as important as…
We study hybrid search in text retrieval where lexical and semantic search are fused together with the intuition that the two are complementary in how they model relevance. In particular, we examine fusion by a convex combination (CC) of…
Leveraging both labeled (input-output associations) and unlabeled data (wider contextual grounding) may provide complementary benefits in retrieval augmented generation (RAG). However, effectively combining evidence from these heterogeneous…
Feature fusion is a commonly used strategy in image retrieval tasks, which aggregates the matching responses of multiple visual features. Feasible sets of features can be either descriptors (SIFT, HSV) for an entire image or the same…
PageRank is a well-known centrality measure for the web used in search engines, representing the importance of each web page. In this paper, we follow the line of recent research on the development of distributed algorithms for computation…
In the rapidly evolving field of e-commerce, the effectiveness of search re-ranking models is crucial for enhancing user experience and driving conversion rates. Despite significant advancements in feature representation and model…
Retrieval-based augmentations (RA) incorporating knowledge from an external database into language models have greatly succeeded in various knowledge-intensive (KI) tasks. However, integrating retrievals in non-knowledge-intensive (NKI)…
We show how full-text search based on inverted indices can be accelerated by clustering the documents without losing results (SeCluD -- SEarch with CLUstered Documents). We develop a fast multilevel clustering algorithm that explicitly uses…
There are several ideas being used today for Web information retrieval, and specifically in Web search engines. The PageRank algorithm is one of those that introduce a content-neutral ranking function over Web pages. This ranking is applied…
Many AI customer service systems use standard NLP pipelines or finetuned language models, which often fall short on ambiguous, multi-intent, or detail-specific queries. This case study evaluates recent techniques: query rewriting, RAG…
In this paper, we try to answer the question of how to improve the state-of-the-art methods for relevance ranking in web search by query segmentation. Here, by query segmentation it is meant to segment the input query into segments,…