Related papers: Revisiting Active Sequential Prediction-Powered Me…
We study prediction-powered conditional inference in the setting where labeled data are scarce, unlabeled covariates are abundant, and a black-box machine-learning predictor is available. The goal is to perform statistical inference on…
We study inference with a small labeled sample, a large unlabeled sample, and high-quality predictions from an external model. We link prediction-powered inference with empirical likelihood by stacking supervised estimating equations based…
We study the value of information in sequential compressed sensing by characterizing the performance of sequential information guided sensing in practical scenarios when information is inaccurate. In particular, we assume the signal…
Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test…
Inspired by the concept of active learning, we propose active inference$\unicode{x2013}$a methodology for statistical inference with machine-learning-assisted data collection. Assuming a budget on the number of labels that can be collected,…
State-of-the-art machine learning models require access to significant amount of annotated data in order to achieve the desired level of performance. While unlabelled data can be largely available and even abundant, annotation process can…
Due to the privacy protection or the difficulty of data collection, we cannot observe individual outputs for each instance, but we can observe aggregated outputs that are summed over multiple instances in a set in some real-world…
Non-conservative uncertainty bounds are key for both assessing an estimation algorithm's accuracy and in view of downstream tasks, such as its deployment in safety-critical contexts. In this paper, we derive a tight, non-asymptotic…
A framework is introduced for actively and adaptively solving a sequence of machine learning problems, which are changing in bounded manner from one time step to the next. An algorithm is developed that actively queries the labels of the…
The topic of deep learning has seen a surge of interest in recent years both within and outside of the field of Statistics. Deep models leverage both nonlinearity and interaction effects to provide superior predictions in many cases when…
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain. These predictions can then be deferred to humans for further evaluation. As an everlasting challenge for machine learning, in many…
Unlabeled data are increasingly prevalent in contemporary economic studies, yet their effective use for improving prediction remains challenging because the outcomes are often costly or even infeasible to observe. Machine learning methods…
This note explores probabilistic sampling weighted by uncertainty in active learning. This method has been previously used and authors have tangentially remarked on its efficacy. The scheme has several benefits: (1) it is computationally…
This paper studies the probability of error associated with the social machine learning framework, which involves an independent training phase followed by a cooperative decision-making phase over a graph. This framework addresses the…
The available data in semi-supervised learning usually consists of relatively small sized labeled data and much larger sized unlabeled data. How to effectively exploit unlabeled data is the key issue. In this paper, we write the regression…
We consider a setting where an agent's uncertainty is represented by a set of probability measures, rather than a single measure. Measure-by-measure updating of such a set of measures upon acquiring new information is well-known to suffer…
Typically, a supervised learning model is trained using passive learning by randomly selecting unlabelled instances to annotate. This approach is effective for learning a model, but can be costly in cases where acquiring labelled instances…
We consider a setting where an agent's uncertainty is represented by a set of probability measures, rather than a single measure. Measure-bymeasure updating of such a set of measures upon acquiring new information is well-known to suffer…
We study the multiplicative hazards model with intermittently observed longitudinal covariates and time-varying coefficients. For such models, the existing ad hoc approach, such as the last value carried forward, is biased. We propose a…
Motivated by the need to analyze continuously updated data sets in the context of time-to-event modeling, we propose a novel nonparametric approach to estimate the conditional hazard function given a set of continuous and discrete…