English
Related papers

Related papers: Feature selection in high-dimensional dataset usin…

200 papers

The exponential growth of data in current times and the demand to gain information and knowledge from the data present new challenges for database researchers. Known database systems and algorithms are no longer capable of effectively…

Databases · Computer Science 2017-12-06 Yaron Gonen

While building machine learning models, Feature selection (FS) stands out as an essential preprocessing step used to handle the uncertainty and vagueness in the data. Recently, the minimum Redundancy and Maximum Relevance (mRMR) approach…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-25 Yelleti Vivek , P. S. V. S. Sai Prasad

Feature selection is an important problem in high-dimensional data analysis and classification. Conventional feature selection approaches focus on detecting the features based on a redundancy criterion using learning and feature searching…

Computer Vision and Pattern Recognition · Computer Science 2012-01-31 Alex Pappachen James , Sima Dimitrijev

Feature selection (FS) is a key research area in the machine learning and data mining fields, removing irrelevant and redundant features usually helps to reduce the effort required to process a dataset while maintaining or even improving…

Machine Learning · Computer Science 2018-11-02 Raul-Jose Palma-Mendoza , Daniel Rodriguez , Luis de-Marcos

We propose a feature selection method that finds non-redundant features from a large and high-dimensional data in nonlinear way. Specifically, we propose a nonlinear extension of the non-negative least-angle regression (LARS) called…

Machine Learning · Statistics 2014-11-11 Makoto Yamada , Avishek Saha , Hua Ouyang , Dawei Yin , Yi Chang

MapReduce, the popular programming paradigm for large-scale data processing, has traditionally been deployed over tightly-coupled clusters where the data is already locally available. The assumption that the data and compute resources are…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-07-31 Benjamin Heintz , Abhishek Chandra , Ramesh K. Sitaraman

Many machine learning applications such as in vision, biology and social networking deal with data in high dimensions. Feature selection is typically employed to select a subset of features which im- proves generalization accuracy as well…

Machine Learning · Computer Science 2016-06-15 Yamuna Prasad , Dinesh Khandelwal , K. K. Biswas

In machine learning applications for online product offerings and marketing strategies, there are often hundreds or thousands of features available to build such models. Feature selection is one essential method in such applications for…

Machine Learning · Statistics 2019-08-16 Zhenyu Zhao , Radhika Anand , Mallory Wang

This paper describes an effective and efficient image classification framework nominated distributed deep representation learning model (DDRL). The aim is to strike the balance between the computational intensive deep learning approaches…

Computer Vision and Pattern Recognition · Computer Science 2016-07-05 Le Dong , Na Lv , Qianni Zhang , Shanshan Xie , Ling He , Mengdie Mao

This paper describes how to convert a machine learning problem into a series of map-reduce tasks. We study logistic regression algorithm. In logistic regression algorithm, it is assumed that samples are independent and each sample is…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-06 Qi Li

Feature selection is a pattern recognition approach to choose important variables according to some criteria to distinguish or explain certain phenomena. There are many genomic and proteomic applications which rely on feature selection to…

Computer Vision and Pattern Recognition · Computer Science 2011-06-13 Fabricio Martins Lopes , David Correa Martins-Jr , Roberto M. Cesar-Jr

Feature selection is a critical step in the analysis of high-dimensional data, where the number of features often vastly exceeds the number of samples. Effective feature selection not only improves model performance and interpretability but…

Machine Learning · Computer Science 2025-01-27 Raquel Espinosa , Gracia Sánchez , José Palma , Fernando Jiménez

High-dimensional feature selection is a central problem in a variety of application domains such as machine learning, image analysis, and genomics. In this paper, we propose graph-based tests as a useful basis for feature selection. We…

Methodology · Statistics 2024-08-13 Swarnadip Ghosh , Somabha Mukherjee , Divyansh Agarwal , Yichen He , Mingzhi Song , Xuejiao Pei

This paper is concerned with the problem of low rank plus sparse matrix decomposition for big data. Conventional algorithms for matrix decomposition use the entire data to extract the low-rank and sparse components, and are based on…

Numerical Analysis · Computer Science 2017-03-17 Mostafa Rahmani , George Atia

Feature selection is a critical step in high-dimensional classification tasks, particularly under challenging conditions of double imbalance, namely settings characterized by both class imbalance in the response variable and dimensional…

Methodology · Statistics 2025-06-13 Fabio Demaria

When dealing with massive data sorting, we usually use Hadoop which is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. A common approach in implement of…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-06-02 Zhuo Wang , Longlong Tian , Dianjie Guo , Xiaoming Jiang

We consider the enumeration of maximal bipartite cliques (bicliques) from a large graph, a task central to many practical data mining problems in social network analysis and bioinformatics. We present novel parallel algorithms for the…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-04-22 Arko Provo Mukherjee , Srikanta Tirthapura

MapReduce is a technique used to vastly improve distributed processing of data and can massively speed up computation. Hadoop and its MapReduce relies on JVM and Java which is expensive on memory. High Performance Computing based MapReduce…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-29 Vignesh S. , Muthumanikandan V. , Siddarth S. , Sainath G

The theory of statistical inference along with the strategy of divide-and-conquer for large- scale data analysis has recently attracted considerable interest due to great popularity of the MapReduce programming paradigm in the Apache Hadoop…

Methodology · Statistics 2017-09-14 Ling Zhou , Peter X. -K. Song

Multiobjective feature selection seeks to determine the most discriminative feature subset by simultaneously optimizing two conflicting objectives: minimizing the number of selected features and the classification error rate. The goal is to…

Neural and Evolutionary Computing · Computer Science 2025-05-12 Zhenxing Zhang , Qianxiang An , Yilei Wang , Chenfeng Wu , Baoling Dong , Chunjie Zhou
‹ Prev 1 2 3 10 Next ›