Related papers: A Statistical Approach Towards Robust Progress Est…
The ability to estimate resource consumption of SQL queries is crucial for a number of tasks in a database system such as admission control, query scheduling and costing during query optimization. Recent work has explored the use of…
The last decade has seen a number of advances in computationally efficient algorithms for statistical methods subject to robustness constraints. An estimator may be robust in a number of different ways: to contamination of the dataset, to…
Formulating efficient SQL queries requires several cycles of tuning and execution, particularly for inexperienced users. We examine methods that can accelerate and improve this interaction by providing insights about SQL queries prior to…
Statistical and structural modeling represent two distinct approaches to data analysis. In this paper, we propose a set of novel methods for combining statistical and structural models for improved prediction and causal inference. Our first…
An important challenge in statistical analysis lies in controlling the bias of estimators due to the ever-increasing data size and model complexity. Approximate numerical methods and data features like censoring and misclassification often…
The aim of this paper is to present a new estimation procedure that can be applied in many statistical frameworks including density and regression and which leads to both robust and optimal (or nearly optimal) estimators. In density…
As the use of machine learning in high impact domains becomes widespread, the importance of evaluating safety has increased. An important aspect of this is evaluating how robust a model is to changes in setting or population, which…
There has been increasing interest in recent years in the development of approaches to estimate causal effects when the number of potential confounders is prohibitively large. This growth in interest has led to a number of potential…
Fully robust versions of the elastic net estimator are introduced for linear and logistic regression. The algorithms to compute the estimators are based on the idea of repeatedly applying the non-robust classical estimators to data subsets…
With the wide adoption of machine learning techniques, requirements have evolved beyond sheer high performance, often requiring models to be trustworthy. A common approach to increase the trustworthiness of such systems is to allow them to…
Model performance evaluation is a critical and expensive task in machine learning and computer vision. Without clear guidelines, practitioners often estimate model accuracy using a one-time completely random selection of the data. However,…
Considering the increasing size of available data, the need for statistical methods that control the finite sample bias is growing. This is mainly due to the frequent settings where the number of variables is large and allowed to increase…
Local decision rules are commonly understood to be more explainable, due to the local nature of the patterns involved. With numerical optimization methods such as gradient boosting, ensembles of local decision rules can gain good predictive…
Active statistical inference is a new method for inference with AI-assisted data collection. Given a budget on the number of labeled data points that can be collected and assuming access to an AI predictive model, the basic idea is to…
This paper considers an empirical likelihood inference for parameters defined by general estimating equations, when data are missing at random. The efficiency of existing estimators depends critically on correctly specifying the conditional…
In many situations, data are recorded over a period of time and may be regarded as realizations of a stochastic process. In this paper, robust estimators for the principal components are considered by adapting the projection pursuit…
State estimation or filtering serves as a fundamental task to enable intelligent decision-making in applications such as autonomous vehicles, robotics, healthcare monitoring, smart grids, intelligent transportation, and predictive…
This paper addresses the problem of providing robust estimators under a functional logistic regression model. Logistic regression is a popular tool in classification problems with two populations. As in functional linear regression,…
In many instances, the application of approximate Bayesian methods is hampered by two practical features: 1) the requirement to project the data down to low-dimensional summary, including the choice of this projection, which ultimately…
Ordinal user-provided ratings across multiple items are frequently encountered in both scientific and commercial applications. Whilst recommender systems are known to do well on these type of data from a predictive point of view, their…