Related papers: Recursive Abstractive Processing for Retrieval in …
Retrieval-augmented language models can better adapt to changes in world state and incorporate long-tail knowledge. However, most existing methods retrieve only short contiguous chunks from a retrieval corpus, limiting holistic…
Information retrieval is a core component of many intelligent systems as it enables conditioning of outputs on new and large-scale datasets. While effective, the standard practice of encoding data into high-dimensional representations for…
Existing summarization systems mostly generate summaries purely relying on the content of the source document. However, even for humans, we usually need some references or exemplars to help us fully understand the source document and write…
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure. We recommend a simple variant of the standard algorithm, in which clusters are merged by…
Text summarization condenses a text to a shorter version while retaining the important informations. Abstractive summarization is a recent development that generates new phrases, rather than simply copying or rephrasing sentences within the…
Retrieval-Augmented Generation (RAG) mitigates the hallucination problem of Large Language Models (LLMs) by incorporating external knowledge. Recursive summarization constructs a hierarchical summary tree by clustering text chunks,…
Addressing the complexity of comprehensive information retrieval, this study introduces an innovative, iterative retrieval-augmented generation system. Our approach uniquely integrates a vector-space driven re-ranking mechanism with…
Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) finds meaningful patterns in spatial data by considering density and spatial proximity. As the clustering algorithm is inherently designed for static…
We propose a method to reconstruct and cluster incomplete high-dimensional data lying in a union of low-dimensional subspaces. Exploring the sparse representation model, we jointly estimate the missing data while imposing the intrinsic…
The task of automatic text summarization produces a concise and fluent text summary while preserving key information and overall meaning. Recent approaches to document-level summarization have seen significant improvements in recent years…
Navigating the vast scientific literature often starts with browsing a paper's abstract. However, when a reader seeks additional information, not present in the abstract, they face a costly cognitive chasm during their dive into the full…
We propose an abstraction-based multi-document summarization framework that can construct new sentences by exploring more fine-grained syntactic units than sentences, namely, noun/verb phrases. Different from existing abstraction-based…
We propose a new outline for adaptive dictionary learning methods for sparse encoding based on a hierarchical clustering of the training data. Through recursive application of a clustering method, the data is organized into a binary…
An approximate textual retrieval algorithm for searching sources with high levels of defects is presented. It considers splitting the words in a query into two overlapping segments and subsequently building composite regular expressions…
This paper presents a novel approach to binary classification using dynamic logistic ensemble models. The proposed method addresses the challenges posed by datasets containing inherent internal clusters that lack explicit feature-based…
Clustering is often used for discovering structure in data. Clustering systems differ in the objective function used to evaluate clustering quality and the control strategy used to search the space of clusterings. Ideally, the search…
Recently, deep architectures, such as recurrent and recursive neural networks have been successfully applied to various natural language processing tasks. Inspired by bidirectional recurrent neural networks which use representations that…
In this research we address the problem of capturing recurring concepts in a data stream environment. Recurrence capture enables the re-use of previously learned classifiers without the need for re-learning while providing for better…
Retrieving procedure-oriented evidence from materials science papers is difficult because key synthesis details are often scattered across long, context-heavy documents and are not well captured by paragraph-only dense retrieval. We present…
Inverted file structure is a common technique for accelerating dense retrieval. It clusters documents based on their embeddings; during searching, it probes nearby clusters w.r.t. an input query and only evaluates documents within them by…