Related papers: Study Features via Exploring Distribution Structur…

Feature Selection with Redundancy-complementariness Dispersion

Feature selection has attracted significant attention in data mining and machine learning in the past decades. Many existing feature selection methods eliminate redundancy by measuring pairwise inter-correlation of features, whereas the…

Machine Learning · Computer Science 2015-02-03 Zhijun Chen , Chaozhong Wu , Yishi Zhang , Zhen Huang , Bin Ran , Ming Zhong , Nengchao Lyu

A Novel Approach for Stable Selection of Informative Redundant Features from High Dimensional fMRI Data

Feature selection is among the most important components because it not only helps enhance the classification accuracy, but also or even more important provides potential biomarker discovery. However, traditional multivariate methods is…

Computer Vision and Pattern Recognition · Computer Science 2016-05-26 Yilun Wang , Zhiqiang Li , Yifeng Wang , Xiaona Wang , Junjie Zheng , Xujuan Duan , Huafu Chen

Robust Novelty Detection through Style-Conscious Feature Ranking

Novelty detection seeks to identify samples deviating from a known distribution, yet data shifts in a multitude of ways, and only a few consist of relevant changes. Aligned with out-of-distribution generalization literature, we advocate for…

Computer Vision and Pattern Recognition · Computer Science 2025-02-03 Stefan Smeu , Elena Burceanu , Emanuela Haller , Andrei Liviu Nicolicioiu

Improved probabilistic regression using diffusion models

Probabilistic regression models the entire predictive distribution of a response variable, offering richer insights than classical point estimates and directly allowing for uncertainty quantification. While diffusion-based generative models…

Machine Learning · Computer Science 2025-10-07 Carlo Kneissl , Christopher Bülte , Philipp Scholl , Gitta Kutyniok

A Probabilistic Model for Data Redundancy in the Feature Domain

In this paper, we use a probabilistic model to estimate the number of uncorrelated features in a large dataset. Our model allows for both pairwise feature correlation (collinearity) and interdependency of multiple features…

Machine Learning · Computer Science 2023-09-26 Ghurumuruhan Ganesan

A Novel Metric for Measuring Data Quality in Classification Applications (extended version)

Data quality is a key element for building and optimizing good learning models. Despite many attempts to characterize data quality, there is still a need for rigorous formalization and an efficient measure of the quality from available…

Machine Learning · Computer Science 2023-12-14 Jouseau Roxane , Salva Sébastien , Samir Chafik

Distributionally Robust Feature Selection

We study the problem of selecting limited features to observe such that models trained on them can perform well simultaneously across multiple subpopulations. This problem has applications in settings where collecting each feature is…

Machine Learning · Computer Science 2025-10-27 Maitreyi Swaroop , Tamar Krishnamurti , Bryan Wilder

Relevant based structure learning for feature selection

Feature selection is an important task in many problems occurring in pattern recognition, bioinformatics, machine learning and data mining applications. The feature selection approach enables us to reduce the computation burden and the…

Machine Learning · Computer Science 2016-08-30 Hadi Zare , Mojtaba Niazi

Discovery and Separation of Features for Invariant Representation Learning

Supervised machine learning models often associate irrelevant nuisance factors with the prediction target, which hurts generalization. We propose a framework for training robust neural networks that induces invariance to nuisances through…

Machine Learning · Computer Science 2019-12-03 Ayush Jaiswal , Rob Brekelmans , Daniel Moyer , Greg Ver Steeg , Wael AbdAlmageed , Premkumar Natarajan

Exploring Data Redundancy in Real-world Image Classification through Data Selection

Deep learning models often require large amounts of data for training, leading to increased costs. It is particularly challenging in medical imaging, i.e., gathering distributed data for centralized training, and meanwhile, obtaining…

Computer Vision and Pattern Recognition · Computer Science 2023-06-27 Zhenyu Tang , Shaoting Zhang , Xiaosong Wang

When Features Beat Noise: A Feature Selection Technique Through Noise-Based Hypothesis Testing

Feature selection has remained a daunting challenge in machine learning and artificial intelligence, where increasingly complex, high-dimensional datasets demand principled strategies for isolating the most informative predictors. Despite…

Machine Learning · Statistics 2025-12-02 Mousam Sinha , Tirtha Sarathi Ghosh , Ridam Pal

Probabilistic Dimensionality Reduction via Structure Learning

We propose a novel probabilistic dimensionality reduction framework that can naturally integrate the generative model and the locality information of data. Based on this framework, we present a new model, which is able to learn a smooth…

Machine Learning · Statistics 2016-10-18 Li Wang

BELIEF: A distance-based redundancy-proof feature selection method for Big Data

With the advent of Big Data era, data reduction methods are highly demanded given its ability to simplify huge data, and ease complex learning processes. Concretely, algorithms that are able to filter relevant dimensions from a set of…

Machine Learning · Computer Science 2018-04-17 Sergio Ramírez-Gallego , Salvador García , Ning Xiong , Francisco Herrera

A Cross-Entropy-based Method to Perform Information-based Feature Selection

From a machine learning point of view, identifying a subset of relevant features from a real data set can be useful to improve the results achieved by classification methods and to reduce their time and space complexity. To achieve this…

Machine Learning · Computer Science 2017-05-23 Pietro Cassara , Alessandro Rozza , Mirco Nanni

Generating Redundant Features with Unsupervised Multi-Tree Genetic Programming

Recently, feature selection has become an increasingly important area of research due to the surge in high-dimensional datasets in all areas of modern life. A plethora of feature selection algorithms have been proposed, but it is difficult…

Neural and Evolutionary Computing · Computer Science 2019-10-24 Andrew Lensen , Bing Xue , Mengjie Zhang

Testing For Nonlinearity Using Redundancies: Quantitative and Qualitative Aspects

A method for testing nonlinearity in time series is described based on information-theoretic functionals -- redundancies, linear and nonlinear forms of which allow either qualitative, or, after incorporating the surrogate data technique,…

comp-gas · Physics 2015-06-24 Milan PALUS

Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers

We introduce a criterion, resilience, which allows properties of a dataset (such as its mean or best low rank approximation) to be robustly computed, even in the presence of a large fraction of arbitrary additional data. Resilience is a…

Machine Learning · Computer Science 2017-11-28 Jacob Steinhardt , Moses Charikar , Gregory Valiant

Network classification through random walks

Network models have been widely used to study diverse systems and analyze their dynamic behaviors. Given the structural variability of networks, an intriguing question arises: Can we infer the type of system represented by a network based…

Social and Information Networks · Computer Science 2025-05-29 Gonzalo Travieso , Joao Merenda , Odemir M. Bruno

Finding Robust Itemsets Under Subsampling

Mining frequent patterns is plagued by the problem of pattern explosion making pattern reduction techniques a key challenge in pattern mining. In this paper we propose a novel theoretical framework for pattern reduction. We do this by…

Databases · Computer Science 2019-04-25 Nikolaj Tatti , Fabian Moerchen , Toon Calders

RedTest: Towards Measuring Redundancy in Deep Neural Networks Effectively

Deep learning has revolutionized computing in many real-world applications, arguably due to its remarkable performance and extreme convenience as an end-to-end solution. However, deep learning models can be costly to train and to use,…

Machine Learning · Computer Science 2024-11-19 Yao Lu , Peixin Zhang , Jingyi Wang , Lei Ma , Xiaoniu Yang , Qi Xuan