English
Related papers

Related papers: Variable Selection in Maximum Mean Discrepancy for…

200 papers

We consider the variable selection problem for two-sample tests, aiming to select the most informative variables to determine whether two collections of samples follow the same distribution. To address this, we propose a novel framework…

Machine Learning · Statistics 2024-12-23 Jie Wang , Santanu S. Dey , Yao Xie

Given a pair of multivariate time-series data of the same length and dimensions, an approach is proposed to select variables and time intervals where the two series are significantly different. In applications where one time series is an…

Methodology · Statistics 2024-12-11 Kensuke Mitsuzawa , Margherita Grossi , Stefano Bortoli , Motonobu Kanagawa

Food authenticity studies are concerned with determining if food samples have been correctly labeled or not. Discriminant analysis methods are an integral part of the methodology for food authentication. Motivated by food authenticity…

Methodology · Statistics 2010-10-08 Thomas Brendan Murphy , Nema Dean , Adrian E. Raftery

Subset selection in multiple linear regression aims to choose a subset of candidate explanatory variables that tradeoff fitting error (explanatory power) and model complexity (number of variables selected). We build mathematical programming…

Machine Learning · Statistics 2020-09-04 Young Woong Park , Diego Klabjan

We propose a method for variable selection in discriminant analysis with mixed categorical and continuous variables. This method is based on a criterion that permits to reduce the variable selection problem to a problem of estimating…

Statistics Theory · Mathematics 2017-03-14 Alban Mbina Mbina , Guy Martial Nkiet , Fulgence Eyi Obiang

Two-sample hypothesis testing-determining whether two sets of data are drawn from the same distribution-is a fundamental problem in statistics and machine learning with broad scientific applications. In the context of nonparametric testing,…

Machine Learning · Statistics 2026-04-21 Antoine Chatalic , Marco Letizia , Nicolas Schreuder , Lorenzo Rosasco

While machine-learning models are flourishing and transforming many aspects of everyday life, the inability of humans to understand complex models poses difficulties for these models to be fully trusted and embraced. Thus, interpretability…

Artificial Intelligence · Computer Science 2020-06-18 Guangyi Zhang , Aristides Gionis

The performance of machine learning models relies heavily on the quality of input data, yet real-world applications often face significant data-related challenges. A common issue arises when curating training data or deploying models: two…

Machine Learning · Computer Science 2025-09-24 Varun Babbar , Zhicheng Guo , Cynthia Rudin

Subset selection is a valuable tool for interpretable learning, scientific discovery, and data compression. However, classical subset selection is often avoided due to selection instability, lack of regularization, and difficulties with…

Machine Learning · Statistics 2022-02-17 Daniel R. Kowal

For high volume data streams and large data warehouses, sampling is used for efficient approximate answers to aggregate queries over selected subsets. Mathematically, we are dealing with a set of weighted items and want to support queries…

Data Structures and Algorithms · Computer Science 2007-05-23 Mario Szegedy , Mikkel Thorup

A key obstacle in automated analytics and meta-learning is the inability to recognize when different datasets contain measurements of the same variable. Because provided attribute labels are often uninformative in practice, this task may be…

Machine Learning · Computer Science 2019-09-12 Jonas Mueller , Alex Smola

We consider training a deep neural network to generate samples from an unknown distribution given i.i.d. data. We frame learning as an optimization minimizing a two-sample test statistic---informally speaking, a good generator network…

Machine Learning · Statistics 2015-05-18 Gintare Karolina Dziugaite , Daniel M. Roy , Zoubin Ghahramani

Existing two-sample testing techniques, particularly those based on choosing a kernel for the Maximum Mean Discrepancy (MMD), often assume equal sample sizes from the two distributions. Applying these methods in practice can require…

Machine Learning · Statistics 2025-12-17 Aaron Wei , Milad Jalali , Danica J. Sutherland

Markov networks are frequently used in sciences to represent conditional independence relationships underlying observed variables arising from a complex system. It is often of interest to understand how an underlying network differs between…

Methodology · Statistics 2021-04-26 Byol Kim , Song Liu , Mladen Kolar

The problem of identifying the most discriminating features when performing supervised learning has been extensively investigated. In particular, several methods for variable selection in model-based classification have been proposed.…

Applications · Statistics 2020-12-16 Andrea Cappozzo , Francesca Greselin , Thomas Brendan Murphy

In this paper we propose a novel variable selection method for two-view settings, or for vector-valued supervised learning problems. Our framework is able to handle extremely large scale selection tasks, where number of data samples could…

Machine Learning · Computer Science 2023-07-06 Sandor Szedmak , Riikka Huusari , Tat Hong Duong Le , Juho Rousu

We consider the high-dimensional discriminant analysis problem. For this problem, different methods have been proposed and justified by establishing exact convergence rates for the classification risk, as well as the l2 convergence results…

Machine Learning · Statistics 2013-06-28 Mladen Kolar , Han Liu

In this paper, we are concerned with how to select significant variables in semiparametric modeling. Variable selection for semiparametric regression models consists of two components: model selection for nonparametric components and…

Statistics Theory · Mathematics 2008-12-18 Runze Li , Hua Liang

Measuring divergence between two distributions is essential in machine learning and statistics and has various applications including binary classification, change point detection, and two-sample test. Furthermore, in the era of big data,…

For many important problems the quantity of interest is an unknown function of the parameters, which is a random vector with known statistics. Since the dependence of the output on this random vector is unknown, the challenge is to identify…

Machine Learning · Statistics 2021-04-28 Themistoklis P. Sapsis
‹ Prev 1 2 3 10 Next ›