Li-Yang Tan — Scifaro

Samplability makes learning easier

The standard definition of PAC learning (Valiant 1984) requires learners to succeed under all distributions -- even ones that are intractable to sample from. This stands in contrast to samplable PAC learning (Blum, Furst, Kearns, and Lipton…

Computational Complexity · Computer Science 2025-12-02 Guy Blanc , Caleb Koch , Jane Lange , Carmen Strassle , Li-Yang Tan

The power of quantum circuits in sampling

We give new evidence that quantum circuits are substantially more powerful than classical circuits. We show, relative to a random oracle, that polynomial-size quantum circuits can sample distributions that subexponential-size classical…

Quantum Physics · Physics 2025-10-07 Guy Blanc , Caleb Koch , Jane Lange , Carmen Strassle , Li-Yang Tan

Computational-Statistical Tradeoffs from NP-hardness

A central question in computer science and statistics is whether efficient algorithms can achieve the information-theoretic limits of statistical problems. Many computational-statistical tradeoffs have been shown under average-case…

Computational Complexity · Computer Science 2025-07-18 Guy Blanc , Caleb Koch , Carmen Strassle , Li-Yang Tan

A Distributional-Lifting Theorem for PAC Learning

The apparent difficulty of efficient distribution-free PAC learning has led to a large body of work on distribution-specific learning. Distributional assumptions facilitate the design of efficient algorithms but also limit their reach and…

Machine Learning · Computer Science 2025-06-23 Guy Blanc , Jane Lange , Carmen Strassle , Li-Yang Tan

Fast decision tree learning solves hard coding-theoretic problems

We connect the problem of properly PAC learning decision trees to the parameterized Nearest Codeword Problem ($k$-NCP). Despite significant effort by the respective communities, algorithmic progress on both problems has been stuck: the…

Computational Complexity · Computer Science 2024-09-27 Caleb Koch , Carmen Strassle , Li-Yang Tan

The Sample Complexity of Smooth Boosting and the Tightness of the Hardcore Theorem

Smooth boosters generate distributions that do not place too much weight on any given example. Originally introduced for their noise-tolerant properties, such boosters have also found applications in differential privacy, reproducibility,…

Computational Complexity · Computer Science 2024-09-19 Guy Blanc , Alexandre Hayderi , Caleb Koch , Li-Yang Tan

Superconstant Inapproximability of Decision Tree Learning

We consider the task of properly PAC learning decision trees with queries. Recent work of Koch, Strassle, and Tan showed that the strictest version of this task, where the hypothesis tree $T$ is required to be optimally small, is NP-hard.…

Computational Complexity · Computer Science 2024-07-02 Caleb Koch , Carmen Strassle , Li-Yang Tan

A Strong Direct Sum Theorem for Distributional Query Complexity

Consider the expected query complexity of computing the $k$-fold direct product $f^{\otimes k}$ of a function $f$ to error $\varepsilon$ with respect to a distribution $\mu^k$. One strategy is to sequentially compute each of the $k$ copies…

Computational Complexity · Computer Science 2024-05-28 Guy Blanc , Caleb Koch , Carmen Strassle , Li-Yang Tan

Harnessing the Power of Choices in Decision Tree Learning

We propose a simple generalization of standard and empirically successful decision tree learning algorithms such as ID3, C4.5, and CART. These algorithms, which have been central to machine learning for decades, are greedy in nature: they…

Machine Learning · Computer Science 2023-10-27 Guy Blanc , Jane Lange , Chirag Pabbaraju , Colin Sullivan , Li-Yang Tan , Mo Tiwari

Properly Learning Decision Trees with Queries Is NP-Hard

We prove that it is NP-hard to properly PAC learn decision trees with queries, resolving a longstanding open problem in learning theory (Bshouty 1993; Guijarro-Lavin-Raghavan 1999; Mehta-Raghavan 2002; Feldman 2016). While there has been a…

Computational Complexity · Computer Science 2023-07-11 Caleb Koch , Carmen Strassle , Li-Yang Tan

A Strong Composition Theorem for Junta Complexity and the Boosting of Property Testers

We prove a strong composition theorem for junta complexity and show how such theorems can be used to generically boost the performance of property testers. The $\varepsilon$-approximate junta complexity of a function $f$ is the smallest…

Computational Complexity · Computer Science 2023-07-11 Guy Blanc , Caleb Koch , Carmen Strassle , Li-Yang Tan

Lifting uniform learners via distributional decomposition

We show how any PAC learning algorithm that works under the uniform distribution can be transformed, in a blackbox fashion, into one that works under an arbitrary and unknown distribution $\mathcal{D}$. The efficiency of our transformation…

Machine Learning · Statistics 2023-03-31 Guy Blanc , Jane Lange , Ali Malik , Li-Yang Tan

Certification with an NP Oracle

In the certification problem, the algorithm is given a function $f$ with certificate complexity $k$ and an input $x^\star$, and the goal is to find a certificate of size $\le \text{poly}(k)$ for $f$'s value at $x^\star$. This problem is in…

Computational Complexity · Computer Science 2022-11-07 Guy Blanc , Caleb Koch , Jane Lange , Carmen Strassle , Li-Yang Tan

Superpolynomial Lower Bounds for Decision Tree Learning and Testing

We establish new hardness results for decision tree optimization problems, adding to a line of work that dates back to Hyafil and Rivest in 1976. We prove, under randomized ETH, superpolynomial lower bounds for two basic problems: given an…

Computational Complexity · Computer Science 2022-10-13 Caleb Koch , Carmen Strassle , Li-Yang Tan

Multitask Learning via Shared Features: Algorithms and Hardness

We investigate the computational efficiency of multitask learning of Boolean functions over the $d$-dimensional hypercube, that are related by means of a feature representation of size $k \ll d$ shared across all tasks. We present a…

Machine Learning · Computer Science 2022-09-08 Konstantina Bairaktari , Guy Blanc , Li-Yang Tan , Jonathan Ullman , Lydia Zakynthinou

A Query-Optimal Algorithm for Finding Counterfactuals

We design an algorithm for finding counterfactuals with strong theoretical guarantees on its performance. For any monotone model $f : X^d \to \{0,1\}$ and instance $x^\star$, our algorithm makes \[ {S(f)^{O(\Delta_f(x^\star))}\cdot \log…

Data Structures and Algorithms · Computer Science 2022-07-15 Guy Blanc , Caleb Koch , Jane Lange , Li-Yang Tan

Open Problem: Properly learning decision trees in polynomial time?

The authors recently gave an $n^{O(\log\log n)}$ time membership query algorithm for properly learning decision trees under the uniform distribution (Blanc et al., 2021). The previous fastest algorithm for this problem ran in $n^{O(\log…

Data Structures and Algorithms · Computer Science 2022-06-30 Guy Blanc , Jane Lange , Mingda Qiao , Li-Yang Tan

On the power of adaptivity in statistical adversaries

We study a fundamental question concerning adversarial noise models in statistical problems where the algorithm receives i.i.d. draws from a distribution $\mathcal{D}$. The definitions of these adversaries specify the type of allowable…

Machine Learning · Computer Science 2022-06-30 Guy Blanc , Jane Lange , Ali Malik , Li-Yang Tan

Using the framework of boosting, we prove that all impurity-based decision tree learning algorithms, including the classic ID3, C4.5, and CART, are highly noise tolerant. Our guarantees hold under the strongest noise model of nasty noise,…

Machine Learning · Computer Science 2022-06-20 Guy Blanc , Jane Lange , Ali Malik , Li-Yang Tan

Reconstructing decision trees

We give the first {\sl reconstruction algorithm} for decision trees: given queries to a function $f$ that is $\mathrm{opt}$-close to a size-$s$ decision tree, our algorithm provides query access to a decision tree $T$ where: $\circ$ $T$ has…

Data Structures and Algorithms · Computer Science 2022-05-24 Guy Blanc , Jane Lange , Li-Yang Tan