Related papers: Efficient Genomic Interval Queries Using Augmented…
Many applications require efficient management of large sets of intervals because many objects are associated with intervals (e.g., time and price intervals). In such interval management systems, range search is a primitive operator for…
Regression trees are a popular machine learning algorithm that fit piecewise constant models by recursively partitioning the predictor space. This paper focuses on statistical inference for a data-dependent model obtained from a fitted…
Genomics has revolutionized biology, enabling the interrogation of whole transcriptomes, genome-wide binding sites for proteins, and many other molecular processes. However, individual genomic assays measure elements that interact in vivo…
Tree data structures, such as red-black trees, quad trees, treaps, or tries, are fundamental tools in computer science. A classical problem in concurrency is to obtain expressive, efficient, and scalable versions of practical tree data…
Recent advancements in Retrieval-Augmented Generation (RAG) have enabled Large Language Models to answer financial questions using external knowledge bases of U.S. SEC filings, earnings reports, and regulatory documents. However, existing…
Although Large Language Models (LLMs) demonstrate significant capabilities, their reliance on parametric knowledge often leads to inaccuracies. Retrieval Augmented Generation (RAG) mitigates this by incorporating external knowledge, but…
A Comparison of Independent and Joint Fine-tuning Strategies for Retrieval-Augmented Generation Download PDF Neal Gregory Lawton, Alfy Samuel, Anoop Kumar, Daben Liu Published: 20 Aug 2025, Retrieval augmented generation (RAG) is a popular…
Natural language text corpora are often available as sets of syntactically parsed trees. A wide range of expressive tree queries are possible over such parsed trees that open a new avenue in searching over natural language text. They not…
We initiate a study of a query-driven approach to designing partition trees for range-searching problems. Our model assumes that a data structure is to be built for an unknown query distribution that we can access through a sampling oracle,…
Quantum computing is a popular topic in computer science, which has recently attracted many studies in various areas such as machine learning and network. However, the topic of quantum data structures seems neglected. There is an open…
In concurrent data structures, the efficiency of set operations can vary significantly depending on the workload characteristics. Numerous concurrent set implementations are optimized and fine-tuned to excel in scenarios characterized by…
Indexing intervals is a fundamental problem, finding a wide range of applications. Recent work on managing large collections of intervals in main memory focused on overlap joins and temporal aggregation problems. In this paper, we propose…
The log-det distance between two aligned DNA sequences was introduced as a tool for statistically consistent inference of a gene tree under simple non-mixture models of sequence evolution. Here we prove that the log-det distance, coupled…
Geosocial reachability queries (\textsc{RangeReach}) determine whether a given vertex in a geosocial network can reach any spatial vertex within a query region. The state-of-the-art 3DReach method answers such queries by encoding graph…
The need for scalable concurrent ordered set data structures with linearizable range query support is increasing due to the rise of multicore computers, data processing platforms and in-memory databases. This paper presents a new concurrent…
This paper proposes an efficient data structure, ikd-Tree, for dynamic space partition. The ikd-Tree incrementally updates a k-d tree with new coming points only, leading to much lower computation time than existing static k-d trees.…
Processing graphs with temporal information (the temporal graphs) has become increasingly important in the real world. In this paper, we study efficient solutions to temporal graph applications using new algorithms for Incremental Minimum…
We propose generalized random forests, a method for non-parametric statistical estimation based on random forests (Breiman, 2001) that can be used to fit any quantity of interest identified as the solution to a set of local moment…
The range, segment and rectangle query problems are fundamental problems in computational geometry, and have extensive applications in many domains. Despite the significant theoretical work on these problems, efficient implementations can…
Introduction: Epigenomic datasets from high-throughput sequencing experiments are commonly summarized as genomic intervals. As the volume of this data grows, so does interest in analyzing it through deep learning. However, the heterogeneity…