Related papers: Adaptive Sampling for Discovery
A framework is introduced for actively and adaptively solving a sequence of machine learning problems, which are changing in bounded manner from one time step to the next. An algorithm is developed that actively queries the labels of the…
We consider an active learning setting where the algorithm has access to a large pool of unlabeled data and a small pool of labeled data. In each iteration, the algorithm chooses few unlabeled data points and obtains their labels from an…
Distributed multi-party learning provides an effective approach for training a joint model with scattered data under legal and practical constraints. However, due to the quagmire of a skewed distribution of data labels across participants…
Active domain adaptation (ADA) studies have mainly addressed query selection while following existing domain adaptation strategies. However, we argue that it is critical to consider not only query selection criteria but also domain…
This paper presents adaptive conformal selection (ACS), an interactive framework for model-free selection with guaranteed error control. Building on conformal selection (Jin and Cand\`es, 2023b), ACS generalizes the approach to support…
In machine learning larger databases are usually associated with higher classification accuracy due to better generalization. This generalization may lead to non-optimal classifiers in some medical applications with highly variable…
We present the first evidence that adaptive learning techniques can boost the discovery of unusual objects within astronomical light curve data sets. Our method follows an active learning strategy where the learning algorithm chooses…
Many problems in computational science and engineering can be described in terms of approximating a smooth function of $d$ variables, defined over an unknown domain of interest $\Omega\subset \mathbb{R}^d$, from sample data. Here both the…
Modern recommendation systems involve massive catalogs of multimodal items, where scalable item identification must balance compactness, semantic fidelity, and downstream effectiveness. Semantic IDs (SIDs) address this need by representing…
The promise of autonomous scientific discovery (ASD) hinges not only on answering questions, but also on knowing which questions to ask. Most recent works in ASD explore the use of large language models (LLMs) in goal-driven settings,…
The detection of anomalous behaviours is an emerging need in many applications, particularly in contexts where security and reliability are critical aspects. While the definition of anomaly strictly depends on the domain framework, it is…
Scene-aware Adaptive Compressive Sensing (ACS) has attracted significant interest due to its promising capability for efficient and high-fidelity acquisition of scene images. ACS typically prescribes adaptive sampling allocation (ASA) based…
Sampling is ubiquitous in machine learning methodologies. Due to the growth of large datasets and model complexity, we want to learn and adapt the sampling process while training a representation. Towards achieving this grand goal, a…
In this paper, we propose an adaptive sieving (AS) strategy for solving general sparse machine learning models by effectively exploring the intrinsic sparsity of the solutions, wherein only a sequence of reduced problems with much smaller…
The problem of estimating the number $n$ of distinct keys of a large collection of $N$ data is well known in computer science. A classical algorithm is the adaptive sampling (AS). $n$ can be estimated by $R.2^D$, where $R$ is the final…
Sampling is a fundamental problem in computer science and statistics. However, for a given task and stream, it is often not possible to choose good sampling probabilities in advance. We derive a general framework for adaptively changing the…
The real challenge in pattern recognition task and machine learning process is to train a discriminator using labeled data and use it to distinguish between future data as accurate as possible. However, most of the problems in the real…
Stochastic gradient decent~(SGD) and its variants, including some accelerated variants, have become popular for training in machine learning. However, in all existing SGD and its variants, the sample size in each iteration~(epoch) of…
In recent years, knowledge distillation (KD) has been widely used to derive efficient models. Through imitating a large teacher model, a lightweight student model can achieve comparable performance with more efficiency. However, most…
Subsampling is commonly used to mitigate costs associated with data acquisition, such as time or energy requirements, motivating the development of algorithms for estimating the fully-sampled signal of interest $x$ from partially observed…