Related papers: Consistent and Flexible Selectivity Estimation for…

Multi-Attribute Selectivity Estimation Using Deep Learning

Selectivity estimation - the problem of estimating the result size of queries - is a fundamental problem in databases. Accurate estimation of query selectivity involving multiple correlated attributes is especially challenging. Poor…

Databases · Computer Science 2019-06-19 Shohedul Hasan , Saravanan Thirumuruganathan , Jees Augustine , Nick Koudas , Gautam Das

Consistent Estimation for Partition-wise Regression and Classification Models

Partition-wise models offer a flexible approach for modeling complex and multidimensional data that are capable of producing interpretable results. They are based on partitioning the observed data into regions, each of which is modeled with…

Methodology · Statistics 2017-06-07 Rex C. Y. Cheung , Alexander Aue , Thomas C. M. Lee

Dynamic Feature Selection from Variable Feature Sets Using Features of Features

Machine learning models usually assume that a set of feature values used to obtain an output is fixed in advance. However, in many real-world problems, a cost is associated with measuring these features. To address the issue of reducing…

Machine Learning · Computer Science 2025-03-13 Katsumi Takahashi , Koh Takeuchi , Hisashi Kashima

A High-Dimensional Feature Selection Algorithm Based on Multiobjective Differential Evolution

Multiobjective feature selection seeks to determine the most discriminative feature subset by simultaneously optimizing two conflicting objectives: minimizing the number of selected features and the classification error rate. The goal is to…

Neural and Evolutionary Computing · Computer Science 2025-05-12 Zhenxing Zhang , Qianxiang An , Yilei Wang , Chenfeng Wu , Baoling Dong , Chunjie Zhou

Permutation-based multi-objective evolutionary feature selection for high-dimensional data

Feature selection is a critical step in the analysis of high-dimensional data, where the number of features often vastly exceeds the number of samples. Effective feature selection not only improves model performance and interpretability but…

Machine Learning · Computer Science 2025-01-27 Raquel Espinosa , Gracia Sánchez , José Palma , Fernando Jiménez

Embedding Feature Selection for Large-scale Hierarchical Classification

Large-scale Hierarchical Classification (HC) involves datasets consisting of thousands of classes and millions of training instances with high-dimensional features posing several big data challenges. Feature selection that aims to select…

Machine Learning · Computer Science 2017-06-07 Azad Naik , Huzefa Rangwala

High-Dimensional Probability Estimation with Deep Density Models

One of the fundamental problems in machine learning is the estimation of a probability distribution from data. Many techniques have been proposed to study the structure of data, most often building around the assumption that observations…

Machine Learning · Statistics 2013-02-22 Oren Rippel , Ryan Prescott Adams

Feature discriminativity estimation in CNNs for transfer learning

The purpose of feature extraction on convolutional neural networks is to reuse deep representations learnt for a pre-trained model to solve a new, potentially unrelated problem. However, raw feature extraction from all layers is unfeasible…

Neural and Evolutionary Computing · Computer Science 2019-11-11 Victor Gimenez-Abalos , Armand Vilalta , Dario Garcia-Gasulla , Jesus Labarta , Eduard Ayguadé

Feature and Variable Selection in Classification

The amount of information in the form of features and variables avail- able to machine learning algorithms is ever increasing. This can lead to classifiers that are prone to overfitting in high dimensions, high di- mensional models do not…

Machine Learning · Computer Science 2014-02-12 Aaron Karper

Feature Selection with Annealing for Computer Vision and Big Data Learning

Many computer vision and medical imaging problems are faced with learning from large-scale datasets, with millions of observations and features. In this paper we propose a novel efficient learning scheme that tightens a sparsity constraint…

Machine Learning · Statistics 2017-02-07 Adrian Barbu , Yiyuan She , Liangjing Ding , Gary Gramajo

Diffusion-Driven High-Dimensional Variable Selection

Variable selection for high-dimensional, highly correlated data has long been a challenging problem, often yielding unstable and unreliable models. We propose a resample-aggregate framework that exploits diffusion models' ability to…

Methodology · Statistics 2025-08-20 Minjie Wang , Xiaotong Shen , Wei Pan

Positive region preserved random sampling: an efficient feature selection method for massive data

Selecting relevant features is an important and necessary step for intelligent machines to maximize their chances of success. However, intelligent machines generally have no enough computing resources when faced with huge volume of data.…

Machine Learning · Computer Science 2025-07-04 Hexiang Bai , Deyu Li , Jiye Liang , Yanhui Zhai

Scalable Sampling for High Utility Patterns

Discovering valuable insights from data through meaningful associations is a crucial task. However, it becomes challenging when trying to identify representative patterns in quantitative databases, especially with large datasets, as…

Databases · Computer Science 2024-10-31 Lamine Diop , Marc Plantevit

Error Controlled Feature Selection for Ultrahigh Dimensional and Highly Correlated Feature Space Using Deep Learning

In recent years, deep learning has been at the center of analytics due to its impressive empirical success in analyzing complex data objects. Despite this success, most of the existing tools behave like black-box machines, thus the…

Machine Learning · Statistics 2022-11-02 Arkaprabha Ganguli , David Todem , Tapabrata Maiti

Beyond Discrete Selection: Continuous Embedding Space Optimization for Generative Feature Selection

The goal of Feature Selection - comprising filter, wrapper, and embedded approaches - is to find the optimal feature subset for designated downstream tasks. Nevertheless, current feature selection methods are limited by: 1) the selection…

Machine Learning · Computer Science 2023-09-18 Meng Xiao , Dongjie Wang , Min Wu , Pengfei Wang , Yuanchun Zhou , Yanjie Fu

Optimal Feature Selection in High-Dimensional Discriminant Analysis

We consider the high-dimensional discriminant analysis problem. For this problem, different methods have been proposed and justified by establishing exact convergence rates for the classification risk, as well as the l2 convergence results…

Machine Learning · Statistics 2013-06-28 Mladen Kolar , Han Liu

Variable Selection for Clustering and Classification

As data sets continue to grow in size and complexity, effective and efficient techniques are needed to target important features in the variable space. Many of the variable selection techniques that are commonly used alongside clustering…

Computation · Statistics 2013-03-22 Jeffrey L. Andrews , Paul D. McNicholas

Recognizing Variables from their Data via Deep Embeddings of Distributions

A key obstacle in automated analytics and meta-learning is the inability to recognize when different datasets contain measurements of the same variable. Because provided attribute labels are often uninformative in practice, this task may be…

Machine Learning · Computer Science 2019-09-12 Jonas Mueller , Alex Smola

Adapting to Continuous Covariate Shift via Online Density Ratio Estimation

Dealing with distribution shifts is one of the central challenges for modern machine learning. One fundamental situation is the covariate shift, where the input distributions of data change from training to testing stages while the…

Machine Learning · Computer Science 2024-05-28 Yu-Jie Zhang , Zhen-Yu Zhang , Peng Zhao , Masashi Sugiyama

DsDm: Model-Aware Dataset Selection with Datamodels

When selecting data for training large-scale models, standard practice is to filter for examples that match human notions of data quality. Such filtering yields qualitatively clean datapoints that intuitively should improve model behavior.…

Machine Learning · Computer Science 2024-01-24 Logan Engstrom , Axel Feldmann , Aleksander Madry