Computer Science
We study language generation in the limit under bounded memory. In this task, a learner observes examples from an unknown target language one at a time and must eventually output only new valid examples. Prior work assumes access to the…
Deciding periodicity of infinite words generated by morphisms is a classical result in combinatorics on words from 80's by Harju, Linna and Pansiot. In this paper, we are interested in this question in the abelian setting. Two words are…
Connected Submodular Maximization (CSM) is a graph problem with important applications to wireless network deployment, path planning, epidemic outbreaks, and cancer genome studies. In CSM, we are given a graph $G$, a non-negative monotone…
We give a randomized algorithm that samples a nearly uniform Eulerian tour of a directed Eulerian multigraph with $m$ arcs in $\widetilde O(m^{3/2})$ time. The guarantee is worst-case, applies to arbitrary directed Eulerian multigraphs, and…
Given a connected graph $G$ and a terminal set $R \subseteq V(G)$, the minimum Steiner tree problem (ST) asks for a tree that spans all of $R$ with at most $r$ vertices from $V(G)\backslash R$, for some integer $r\geq 0$. A \emph{split…
Determining a linear utility function that correlates with observed candidate rankings is a foundational problem with applications in domains such as admissions, hiring, and recommendation systems, e.g., [Storandt and Funke, AAAI'19, Zhang…
We revisit the problem of Gaussian mean testing in a distributed, communication constrained setting, where each of $n$ users independently observes samples from an unknown $d$-dimensional spherical Gaussian distribution…
Clustering is a basic task in data analysis and machine learning, and the optimization of clustering objectives are well-studied optimization problems; amongst these, the $k$-Means objective is arguably the most well known. Given a…
We study exact predecessor and rank search in a routed, atom-budgeted, certified-repair learned-index architecture. An ordered directory routes each query to a contiguous interval, a counted local predictor returns a certified rank window,…
The $2 \rightarrow q$ norm of a matrix $X \in \mathbb{R}^{n \times d}$ is defined as $\lVert X \rVert_{2 \rightarrow q} = \sup_{\lVert v \rVert_2 = 1} \lVert Xv \rVert_q$. We give polynomial-time multiplicative approximation algorithms for…
Brown et al.\ (2025) described a pre-processing step, called $k$-mer based breaking (KeBaB), that speeds up searching for long maximal exact matches (MEMs) between a pattern $P$ and an indexed repetitive text $T$. KeBaB produces a set of…
Consider the classical Min-Sum Set Cover problem: We are given a universe $\mathcal{U}$ of $n$ elements and a collection $\mathcal{S}$ of $k$ subsets of $\mathcal{U}$. Moreover, a cost function is associated with each set. The goal is to…
Mathematical programming is widely employed across various sectors - such as logistics, energy, and workforce planning - to model and solve industrial optimisation problems, but its use requires substantial domain expertise. Large language…
The application of program transformation and algebraic methods to the development of efficient combinatorial optimization (CO) algorithms relies on an exhaustive combinatorial generator for the problem specification, followed by the fusion…
The compact directed acyclic word graph (CDAWG) [Blumer et al. 1987] of a string is the minimal compact automaton that recognizes all the suffixes of the string. CDAWGs can be used for various string tasks including text pattern searching,…
Effective code optimization in compilers is crucial for computer and software engineering. The success of these optimizations primarily depends on the selection and ordering of the optimization passes applied to the code. While most…
Modern equality saturation systems excel at expression-level rewrites by exploring large spaces of equivalent programs without suffering from the phase-ordering problem. How- ever, these systems struggle to represent equivalence directly…
Multi-constraint hypergraph partitioning is a generalization of balanced partitioning, where the vertex set of a hypergraph is partitioned such that the inter-block connectivity of hyperedges is minimized while balancing the vertices with…
The \emph{Separation Lemma} is a simple yet powerful tool, akin to the well-known \emph{Isolation Lemma}, that guarantees the uniqueness of certain set sums. Bandopadhyay et al.\ introduced this lemma to establish lower bounds for the \ALP…
Given a graph $G$ of $n$ nodes partitioned into facilities and customers, the $r$-edge interdiction covering problem (REIC) is to remove up to $r$ edges so as to maximize the total weight of customers disconnected from all facilities, which…