Computer Science

On Language Generation in the Limit with Bounded Memory

We study language generation in the limit under bounded memory. In this task, a learner observes examples from an unknown target language one at a time and must eventually output only new valid examples. Prior work assumes access to the…

Data Structures and Algorithms · Computer Science 2026-05-29 Jon Kleinberg , Anay Mehrotra , Amin Saberi , Grigoris Velegkas

A Radius-Sensitive Approximation Algorithm for Connected Submodular Maximization

Connected Submodular Maximization (CSM) is a graph problem with important applications to wireless network deployment, path planning, epidemic outbreaks, and cancer genome studies. In CSM, we are given a graph $G$, a non-negative monotone…

Data Structures and Algorithms · Computer Science 2026-05-29 Philip Cervenjak , Junhao Gan , Naonori Kakimura , Seeun William Umboh , Anthony Wirth

Sampling Directed Eulerian Tours in $\widetilde O(m^{3/2})$ Time

We give a randomized algorithm that samples a nearly uniform Eulerian tour of a directed Eulerian multigraph with $m$ arcs in $\widetilde O(m^{3/2})$ time. The guarantee is worst-case, applies to arbitrary directed Eulerian multigraphs, and…

Data Structures and Algorithms · Computer Science 2026-05-29 Nima Anari

Explaining Rankings with Hidden Group Bonuses

Determining a linear utility function that correlates with observed candidate rankings is a foundational problem with applications in domains such as admissions, hiring, and recommendation systems, e.g., [Storandt and Funke, AAAI'19, Zhang…

Data Structures and Algorithms · Computer Science 2026-05-29 Alvin Hong Yao Yan , Suraj Shetiya , Sujoy Bhore , Priyanka Golia , Diptarka Chakraborty

Distributed Gaussian Mean Testing under Communication Constraints: messages, samples, and coins

We revisit the problem of Gaussian mean testing in a distributed, communication constrained setting, where each of $n$ users independently observes samples from an unknown $d$-dimensional spherical Gaussian distribution…

Data Structures and Algorithms · Computer Science 2026-05-29 Clément L. Canonne , Nimitt

An Improved Greedy Approximation for (Metric) $k$-Means

Clustering is a basic task in data analysis and machine learning, and the optimization of clustering objectives are well-studied optimization problems; amongst these, the $k$-Means objective is arguably the most well known. Given a…

Data Structures and Algorithms · Computer Science 2026-05-29 Moses Charikar , Vincent Cohen-Addad , Ruiquan Gao , Fabrizio Grandoni , Euiwoong Lee , Ernest van Wijland

Residual-Entropy Accounting for Routed Atom-Budgeted Learned Indexes

We study exact predecessor and rank search in a routed, atom-budgeted, certified-repair learned-index architecture. An ordered directory routes each query to a contiguous interval, a counted local predictor returns a certified rank window,…

Data Structures and Algorithms · Computer Science 2026-05-29 Faruk Alpay , Levent Sarioglu

The Biosecurity Blind Spot: Systematic Dual-use Detection in Open Science Infrastructure

AI is transforming life sciences research at unprecedented speed, accelerating discovery across protein structure prediction, genome modeling, and drug development (Jumper et al., 2021; Mak et al., 2024). Yet this rapid advancement, coupled…

Digital Libraries · Computer Science 2026-05-29 Vasudha Sharma , Chakresh Kumar Singh , Jayesh Choudhari , Dharmit Nakrani

Co-creation of AI technology, empowering curators of cultural heritage information and guarding research commons

The substance of this paper is the description of the use of Retrieval-Augmented Generation (RAG) for specific digital collections of cultural assets. The collections are provided by institutions operating in the cultural sector. The…

Digital Libraries · Computer Science 2026-05-29 Andrea Scharnhorst , Han Yang , Jetze Touber , Kim Ferguson , Philipp Mayr , Vyacheslav Tykhonov

Algorithms with Polynomially-Improved Approximation Factors for the $2 \rightarrow q$ Norm, and Applications

The $2 \rightarrow q$ norm of a matrix $X \in \mathbb{R}^{n \times d}$ is defined as $\lVert X \rVert_{2 \rightarrow q} = \sup_{\lVert v \rVert_2 = 1} \lVert Xv \rVert_q$. We give polynomial-time multiplicative approximation algorithms for…

Data Structures and Algorithms · Computer Science 2026-05-29 Samuel B. Hopkins , Stefan Tiegel

Parse indexing for discarding short pseudo-MEMs safely

Brown et al.\ (2025) described a pre-processing step, called $k$-mer based breaking (KeBaB), that speeds up searching for long maximal exact matches (MEMs) between a pattern $P$ and an indexed repetitive text $T$. KeBaB produces a set of…

Data Structures and Algorithms · Computer Science 2026-05-29 Travis Gagie

Min-Sum Set Cover on Parallel Machines

Consider the classical Min-Sum Set Cover problem: We are given a universe $\mathcal{U}$ of $n$ elements and a collection $\mathcal{S}$ of $k$ subsets of $\mathcal{U}$. Moreover, a cost function is associated with each set. The goal is to…

Data Structures and Algorithms · Computer Science 2026-05-29 Michał Szyfelbein

On the sensitivity of CDAWG-grammars

The compact directed acyclic word graph (CDAWG) [Blumer et al. 1987] of a string is the minimal compact automaton that recognizes all the suffixes of the string. CDAWGs can be used for various string tasks including text pattern searching,…

Data Structures and Algorithms · Computer Science 2026-05-29 Hiroto Fujimaru , Shunsuke Inenaga

Verified Misguidance: Measuring Structural Citation Failures in Search-Augmented LLMs

Users of search-augmented LLMs rely on citations as evidence that responses are grounded in real sources, and rarely verify the cited pages themselves. Millions of queries per day now pass through these systems, making citation quality a…

Digital Libraries · Computer Science 2026-05-28 Yongsik Seo , Wooseok Jeong , Eunyoung Kim , Hyeonseo Jang , Dongha Lee

High-Quality Multi-Constraint Hypergraph Partitioning via Greedy Rebalancing

Multi-constraint hypergraph partitioning is a generalization of balanced partitioning, where the vertex set of a hypergraph is partitioned such that the inter-block connectivity of hyperedges is minimized while balancing the vertices with…

Data Structures and Algorithms · Computer Science 2026-05-28 Nikolai Maas

A Deterministic Separation Lemma

The \emph{Separation Lemma} is a simple yet powerful tool, akin to the well-known \emph{Isolation Lemma}, that guarantees the uniqueness of certain set sums. Bandopadhyay et al.\ introduced this lemma to establish lower bounds for the \ALP…

Data Structures and Algorithms · Computer Science 2026-05-28 Abhishek Sahu

Efficient Algorithms for Interdicting Facilities in Trees and Bounded Treewidth Graphs

Given a graph $G$ of $n$ nodes partitioned into facilities and customers, the $r$-edge interdiction covering problem (REIC) is to remove up to $r$ edges so as to maximize the total weight of customers disconnected from all facilities, which…

Data Structures and Algorithms · Computer Science 2026-05-28 Ali Abbasi , Eli Friedman , Leana Golubchik , Samir Khuller , Marco Paolieri

Smoothed Score Queries and the Complexity of Sampling

We study the query complexity of sampling from high-dimensional Gaussian distributions using gradient information. In the standard oracle model, exact gradients expose only matrix-vector products with the precision matrix, leading to…

Data Structures and Algorithms · Computer Science 2026-05-28 Jingbo Liu

CiteCheck: Retrieval-Grounded Detection of LLM Citation Hallucinations in Scientific Text

Large language models (LLMs) are increasingly used to generate scientific reports, but they can produce references that appear plausible while containing corrupted metadata or pointing to papers that do not exist. We introduce CiteCheck, a…

Digital Libraries · Computer Science 2026-05-28 Khashayar Khajavi , Shaghayegh Sadeghi , Rise Adhikari , Alexander Tessier

Proper Agnostic Learning of Functions of Halfspaces under Gaussian Marginals

We study the problem of computationally efficient proper agnostic learning of multidimensional concept classes under the Gaussian distribution. In this setting, given i.i.d. labeled samples from an unknown distribution over $\mathbb{R}^d…

Data Structures and Algorithms · Computer Science 2026-05-28 Sergei Tikhonov , Arsen Vasilyan