Related papers: An efficient algorithm for T-estimation
The aim of this paper is to present a new estimation procedure that can be applied in many statistical frameworks including density and regression and which leads to both robust and optimal (or nearly optimal) estimators. In density…
We propose the holdout randomization test (HRT), an approach to feature selection using black box predictive models. The HRT is a specialized version of the conditional randomization test (CRT; Candes et al., 2018) that uses data splitting…
Randomized numerical linear algebra is proved to bridge theoretical advancements to offer scalable solutions for approximating tensor decomposition. This paper introduces fast randomized algorithms for solving the fixed Tucker-rank problem…
This paper proposes an efficient algorithm (HOLRR) to handle regression tasks where the outputs have a tensor structure. We formulate the regression problem as the minimization of a least square criterion under a multilinear rank…
Estimation of probability density function from samples is one of the central problems in statistics and machine learning. Modern neural network-based models can learn high dimensional distributions but have problems with hyperparameter…
Good robust estimators can be tuned to combine a high breakdown point and a specified asymptotic efficiency at a central model. This happens in regression with MM- and tau-estimators among others. However, the finite-sample efficiency of…
Density level sets can be estimated using plug-in methods, excess mass algorithms or a hybrid of the two previous methodologies. The plug-in algorithms are based on replacing the unknown density by some nonparametric estimator, usually the…
Outliers widely occur in big-data applications and may severely affect statistical estimation and inference. In this paper, a framework of outlier-resistant estimation is introduced to robustify an arbitrarily given loss function. It has a…
Robust estimation under Huber's $\epsilon$-contamination model has become an important topic in statistics and theoretical computer science. Statistically optimal procedures such as Tukey's median and other estimators based on depth…
Robust estimation and variable selection procedure are developed for the extended t-process regression model with functional data. Statistical properties such as consistency of estimators and predictions are obtained. Numerical studies show…
We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…
Density ratio estimation is a vital tool in both machine learning and statistical community. However, due to the unbounded nature of density ratio, the estimation procedure can be vulnerable to corrupted data points, which often pushes the…
Nowadays, the bulk of Internet traffic uses TCP protocol for reliable transmission. But the standard TCP's performance is very poor in High Speed Networks (HSN) and hence the core gigabytes links are usually underutilization. This problem…
This paper develops fast and efficient algorithms for computing Tucker decomposition with a given multilinear rank. By combining random projection and the power scheme, we propose two efficient randomized versions for the truncated…
This paper deals with robust regression and subspace estimation and more precisely with the problem of minimizing a saturated loss function. In particular, we focus on computational complexity issues and show that an exact algorithm with…
The estimation of a density profile from experimental data points is a challenging problem, usually tackled by plotting a histogram. Prior assumptions on the nature of the density, from its smoothness to the specification of its form, allow…
This article studies the robust version of persistent homology based on trimming methodology to capture the geometric feature through support of the data in presence of outliers. Precisely speaking, the proposed methodology works when the…
Due to the progressive growth of the amount of data available in a wide variety of scientific fields, it has become more difficult to ma- nipulate and analyze such information. Even though datasets have grown in size, the K-means algorithm…
Efficient coordination for collective spatial distribution is a fundamental challenge in multi-agent systems. Prior research on Density-Driven Optimal Control (D2OC) established a framework to match agent trajectories to a desired spatial…
We present a short tutorial and introduction to using the R package TDA, which provides some tools for Topological Data Analysis. In particular, it includes implementations of functions that, given some data, provide topological information…