Related papers: Relationship between clustering and algorithmic ph…
This thesis is divided in two parts. The first presents an overview of known results in statistical mechanics of disordered systems and its approach to random combinatorial optimization problems. The second part is a discussion of two…
The XOR-satisfiability (XORSAT) problem requires finding an assignment of $n$ Boolean variables that satisfy $m$ exclusive OR (XOR) clauses, whereby each clause constrains a subset of the variables. We consider random XORSAT instances,…
The random $k$-XORSAT problem is a random constraint satisfaction problem of $n$ Boolean variables and $m=rn$ clauses, which a random instance can be expressed as a $G\mathbb{F}(2)$ linear system of the form $Ax=b$, where $A$ is a random $m…
We study a random system of $cn$ linear equations over $n$ variables in GF(2), where each equation contains exactly $r$ variables; this is equivalent to $r$-XORSAT. \cite{ikkm,amxor} determined the clustering threshold, $c^*_r$: if…
We study the performance of stochastic local search algorithms for random instances of the $K$-satisfiability ($K$-SAT) problem. We introduce a new stochastic local search algorithm, ChainSAT, which moves in the energy landscape of a…
We study a random system of cn linear equations over n variables in GF(2), where each equation contains exactly r variables; this is equivalent to r-XORSAT. Previous work has established a clustering threshold, c^*_r for this model: if…
Partly on the basis of heuristic arguments from physics it has been suggested that the performance of certain types of algorithms on random $k$-SAT formulas is linked to phase transitions that affect the geometry of the set of satisfying…
We study a variant of classical clustering formulations in the context of algorithmic fairness, known as diversity-aware clustering. In this variant we are given a collection of facility subsets, and a solution must contain at least a…
Clustering is a NP-hard problem. Thus, no optimal algorithm exists, heuristics are applied to cluster the data. Heuristics can be very resource-intensive, if not applied properly. For substantially large data sets computational efficiencies…
We introduce and study the random "locked" constraint satisfaction problems. When increasing the density of constraints, they display a broad "clustered" phase in which the space of solutions is divided into many isolated points. While the…
The XOR-satisfiability (XORSAT) problem deals with a system of $n$ Boolean variables and $m$ clauses. Each clause is a linear Boolean equation (XOR) of a subset of the variables. A $K$-clause is a clause involving $K$ distinct variables. In…
Random instances of constraint satisfaction problems such as k-SAT provide challenging benchmarks. If there are m constraints over n variables there is typically a large range of densities r=m/n where solutions are known to exist with…
We consider the problem of Gaussian mixture clustering in the high-dimensional limit where the data consists of $m$ points in $n$ dimensions, $n,m \rightarrow \infty$ and $\alpha = m/n$ stays finite. Using exact but non-rigorous methods…
We study the phase diagram and the algorithmic hardness of the random `locked' constraint satisfaction problems, and compare them to the commonly studied 'non-locked' problems like satisfiability of boolean formulas or graph coloring. The…
We consider stochastic settings for clustering, and develop provably-good approximation algorithms for a number of these notions. These algorithms yield better approximation ratios compared to the usual deterministic clustering setting.…
We determine the complexity of several constraint satisfaction problems using the heuristic algorithm, WalkSAT. At large sizes N, the complexity increases exponentially with N in all cases. Perhaps surprisingly, out of all the models…
We study the behavior of ASAT, a heuristic for solving satisfiability problems by stochastic local search near the SAT/UNSAT transition. The heuristic is focused, i.e. only variables in unsatisfied clauses are updated in each step, and is…
K-means clustering is a cornerstone of data mining, but its efficiency deteriorates when confronted with massive datasets. To address this limitation, we propose a novel heuristic algorithm that leverages the Variable Neighborhood Search…
Clustering is a long-standing research problem and a fundamental tool in AI and data analysis. The traditional k-center problem, a fundamental theoretical challenge in clustering, has a best possible approximation ratio of 2, and any…
The typical complexity of Constraint Satisfaction Problems (CSPs) can be investigated by means of random ensembles of instances. The latter exhibit many threshold phenomena besides their satisfiability phase transition, in particular a…