Related papers: Chunk List: Concurrent Data Structures

Concurrent Hash Tables: Fast and General?(!)

Concurrent hash tables are one of the most important concurrent data structures with numerous applications. Since hash table accesses can dominate the execution time of the overall application, we need implementations that achieve good…

Data Structures and Algorithms · Computer Science 2016-09-07 Tobias Maier , Peter Sanders , Roman Dementiev

Vectorized Sequence-Based Chunking for Data Deduplication

Data deduplication has gained wide acclaim as a mechanism to improve storage efficiency and conserve network bandwidth. Its most critical phase, data chunking, is responsible for the overall space savings achieved via the deduplication…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-28 Sreeharsha Udayashankar , Samer Al-Kiswany

List Sort: A New Approach for Sorting List to Reduce Execution Time

In this paper we are proposing a new sorting algorithm, List Sort algorithm, is based on the dynamic memory allocation. In this research study we have also shown the comparison of various efficient sorting techniques with List sort. Due the…

Data Structures and Algorithms · Computer Science 2013-10-30 Adarsh Kumar Verma , Prashant Kumar

Rethinking Chunk Size For Long-Document Retrieval: A Multi-Dataset Analysis

Chunking is a crucial preprocessing step in retrieval-augmented generation (RAG) systems, significantly impacting retrieval effectiveness across diverse datasets. In this study, we systematically evaluate fixed-size chunking strategies and…

Information Retrieval · Computer Science 2025-05-30 Sinchana Ramakanth Bhat , Max Rudat , Jannis Spiekermann , Nicolas Flores-Herr

Building Efficient Concurrent Graph Object through Composition of List-based Set

In this paper, we propose a generic concurrent directed graph (for shared memory architecture) that is concurrently being updated by threads adding/deleting vertices and edges. The graph is constructed by the composition of the well known…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-01 Sathya Peri , Muktikanta Sa , Nandini Singhal

Faster Concurrent Range Queries with Contention Adapting Search Trees Using Immutable Data

The need for scalable concurrent ordered set data structures with linearizable range query support is increasing due to the rise of multicore computers, data processing platforms and in-memory databases. This paper presents a new concurrent…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-09-05 Kjell Winblad

Continual General Chunking Problem and SyncMap

Humans possess an inherent ability to chunk sequences into their constituent parts. In fact, this ability is thought to bootstrap language skills and learning of image patterns which might be a key to a more animal-like type of…

Artificial Intelligence · Computer Science 2021-04-06 Danilo Vasconcellos Vargas , Toshitake Asabuki

BITS-Tree-An Efficient Data Structure for Segment Storage and Query Processing

In this paper, a new and novel data structure is proposed to dynamically insert and delete segments. Unlike the standard segment trees[3], the proposed data structure permits insertion of a segment with interval range beyond the interval…

Computational Geometry · Computer Science 2015-01-15 K. S. Easwarakumar , T. Hema

Chunking: Continual Learning is not just about Distribution Shift

Work on continual learning (CL) has thus far largely focused on the problems arising from shifts in the data distribution. However, CL can be decomposed into two sub-problems: (a) shifts in the data distribution, and (b) dealing with the…

Machine Learning · Computer Science 2024-07-12 Thomas L. Lee , Amos Storkey

Chunks and Tasks: a programming model for parallelization of dynamic algorithms

We propose Chunks and Tasks, a parallel programming model built on abstractions for both data and work. The application programmer specifies how data and work can be split into smaller pieces, chunks and tasks, respectively. The Chunks and…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-07-29 Emanuel H. Rubensson , Elias Rudberg

Practical Concurrent Priority Queues

Priority queues are abstract data structures which store a set of key/value pairs and allow efficient access to the item with the minimal (maximal) key. Such queues are an important element in various areas of computer science such as…

Data Structures and Algorithms · Computer Science 2015-09-24 Jakob Gruber

A Systematic Investigation of Document Chunking Strategies and Embedding Sensitivity

We present the first large-scale, cross-domain evaluation of document chunking strategies for dense retrieval, addressing a critical but underexplored aspect of retrieval-augmented systems. In our study, 36 segmentation methods spanning…

Computation and Language · Computer Science 2026-03-10 Muhammad Arslan Shaukat , Muntasir Adnan , Carlos C. N. Kuhn

Intent-Driven Dynamic Chunking: Segmenting Documents to Reflect Predicted Information Needs

Breaking long documents into smaller segments is a fundamental challenge in information retrieval. Whether for search engines, question-answering systems, or retrieval-augmented generation (RAG), effective segmentation determines how well…

Information Retrieval · Computer Science 2026-02-17 Christos Koutsiaris

Contraction Clustering (RASTER): A Very Fast Big Data Algorithm for Sequential and Parallel Density-Based Clustering in Linear Time, Constant Memory, and a Single Pass

Clustering is an essential data mining tool for analyzing and grouping similar objects. In big data applications, however, many clustering algorithms are infeasible due to their high memory requirements and/or unfavorable runtime…

Data Structures and Algorithms · Computer Science 2026-01-27 Gregor Ulm , Simon Smith , Adrian Nilsson , Emil Gustavsson , Mats Jirstrand

StruClus: Structural Clustering of Large-Scale Graph Databases

We present a structural clustering algorithm for large-scale datasets of small labeled graphs, utilizing a frequent subgraph sampling strategy. A set of representatives provides an intuitive description of each cluster, supports the…

Databases · Computer Science 2016-10-03 Till Schäfer , Petra Mutzel

A New Approach to Speed up Combinatorial Search Strategies Using Stack and Hash Table

Owing to the significance of combinatorial search strategies both for academia and industry, the introduction of new techniques is a fast growing research field these days. These strategies have really taken different forms ranging from…

Software Engineering · Computer Science 2019-04-08 Bestoun S. Ahmed , Luca M. Gambardella , Kamal Z. Zamli

Highly-Concurrent Doubly-Linked Lists

As file systems are increasingly being deployed on ever larger systems with many cores and multi-gigabytes of memory, scaling the internal data structures of file systems has taken greater importance and urgency. A doubly-linked list is a…

Data Structures and Algorithms · Computer Science 2011-12-07 Nitin Garg , Ed Zhu , Fabiano C. Botelho

Clustering by Constructing Hyper-Planes

As a kind of basic machine learning method, clustering algorithms group data points into different categories based on their similarity or distribution. We present a clustering algorithm by finding hyper-planes to distinguish the data…

Computer Vision and Pattern Recognition · Computer Science 2020-04-28 Luhong Diao , Jinying Gao1 , Manman Deng

A Hybrid Adjacency and Time-Based Data Structure for Analysis of Temporal Networks

Dynamic or temporal networks enable representation of time-varying edges between nodes. Conventional adjacency-based data structures used for storing networks such as adjacency lists were designed without incorporating time and can thus…

Social and Information Networks · Computer Science 2022-06-24 Tanner Hilsabeck , Makan Arastuie , Kevin S. Xu

S2 Chunking: A Hybrid Framework for Document Segmentation Through Integrated Spatial and Semantic Analysis

Document chunking is a critical task in natural language processing (NLP) that involves dividing a document into meaningful segments. Traditional methods often rely solely on semantic analysis, ignoring the spatial layout of elements, which…

Computation and Language · Computer Science 2025-01-13 Prashant Verma