Related papers: Scalable Feature Subset Selection for Big Data usi…

Feature subset selection for Big Data via Chaotic Binary Differential Evolution under Apache Spark

Feature subset selection (FSS) using a wrapper approach is essentially a combinatorial optimization problem having two objective functions namely cardinality of the selected-feature-subset, which should be minimized and the corresponding…

Neural and Evolutionary Computing · Computer Science 2022-02-09 Yelleti Vivek , Vadlamani Ravi , P. Radhakrishna

Parallel bi-objective evolutionary algorithms for scalable feature subset selection via migration strategy under Spark

Feature subset selection (FSS) for classification is inherently a bi-objective optimization problem, where the task is to obtain a feature subset which yields the maximum possible area under the receiver operator characteristic curve (AUC)…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-05-20 Yelleti Vivek , Vadlamani Ravi , P. Radha Krishna

An Information Theoretic Feature Selection Framework for Big Data under Apache Spark

With the advent of extremely high dimensional datasets, dimensionality reduction techniques are becoming mandatory. Among many techniques, feature selection has been growing in interest as an important tool to identify relevant features on…

Artificial Intelligence · Computer Science 2016-10-20 Sergio Ramírez-Gallego , Héctor Mouriño-Talín , David Martínez-Rego , Verónica Bolón-Canedo , José Manuel Benítez , Amparo Alonso-Betanzos , Francisco Herrera

Feature Selection via Binary Simultaneous Perturbation Stochastic Approximation

Feature selection (FS) has become an indispensable task in dealing with today's highly complex pattern recognition problems with massive number of features. In this study, we propose a new wrapper approach for FS based on binary…

Machine Learning · Statistics 2016-03-08 Vural Aksakalli , Milad Malekipirbazari

Distributed Correlation-Based Feature Selection in Spark

CFS (Correlation-Based Feature Selection) is an FS algorithm that has been successfully applied to classification problems in many domains. We describe Distributed CFS (DiCFS) as a completely redesigned, scalable, parallel and distributed…

Machine Learning · Computer Science 2019-02-01 Raul-Jose Palma-Mendoza , Luis de-Marcos , Daniel Rodriguez , Amparo Alonso-Betanzos

A Parallel Random Forest Algorithm for Big Data in a Spark Cloud Computing Environment

With the emergence of the big data age, the issue of how to obtain valuable knowledge from a dataset efficiently and accurately has attracted increasingly attention from both academia and industry. This paper presents a Parallel Random…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-26 Jianguo Chen , Kenli Li , Zhuo Tang , Kashif Bilal , Shui Yu , Chuliang Weng , Keqin Li

A Self-adaptive Weighted Differential Evolution Approach for Large-scale Feature Selection

Recently, many evolutionary computation methods have been developed to solve the feature selection problem. However, the studies focused mainly on small-scale issues, resulting in stagnation issues in local optima and numerical instability…

Neural and Evolutionary Computing · Computer Science 2021-10-28 Xubin Wang , Yunhe Wang , Ka-Chun Wong , Xiangtao Li

Scalable Solutions for Automated Single Pulse Identification and Classification in Radio Astronomy

Data collection for scientific applications is increasing exponentially and is forecasted to soon reach peta- and exabyte scales. Applications which process and analyze scientific data must be scalable and focus on execution performance to…

Instrumentation and Methods for Astrophysics · Physics 2018-10-09 Thomas Devine , Katerina Goseva-Popstojanova , Di Pang

A Novel Scalable Apache Spark Based Feature Extraction Approaches for Huge Protein Sequence and their Clustering Performance Analysis

Genome sequencing projects are rapidly increasing the number of high-dimensional protein sequence datasets. Clustering a high-dimensional protein sequence dataset using traditional machine learning approaches poses many challenges. Many…

Quantitative Methods · Quantitative Biology 2022-04-27 Preeti Jha , Aruna Tiwari , Neha Bharill , Milind Ratnaparkhe , Om Prakash Patel , Nilagiri Harshith , Mukkamalla Mounika , Neha Nagendra

Adaptive Knowledge-based Multi-Objective Evolutionary Algorithm for Hybrid Flow Shop Scheduling Problems with Multiple Parallel Batch Processing Stages

Parallel batch processing machines have extensive applications in the semiconductor manufacturing process. However, the problem models in previous studies regard parallel batch processing as a fixed processing stage in the machining…

Neural and Evolutionary Computing · Computer Science 2024-09-30 Feige Liu , Xin Li , Chao Lu , Wenying Gong

Alignment-free Genomic Analysis via a Big Data Spark Platform

Motivation: Alignment-free distance and similarity functions (AF functions, for short) are a well established alternative to two and multiple sequence alignments for many genomic, metagenomic and epigenomic tasks. Due to data-intensive…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-26 Umberto Ferraro Petrillo , Francesco Palini , Giuseppe Cattaneo , Raffaele Giancarlo

Refining Decision Boundaries In Anomaly Detection Using Similarity Search Within the Feature Space

Detecting rare and diverse anomalies in highly imbalanced datasets-such as Advanced Persistent Threats (APTs) in cybersecurity-remains a fundamental challenge for machine learning systems. Active learning offers a promising direction by…

Machine Learning · Computer Science 2026-02-04 Sidahmed Benabderrahmane , Petko Valtchev , James Cheney , Talal Rahwan

Scaling associative classification for very large datasets

Supervised learning algorithms are nowadays successfully scaling up to datasets that are very large in volume, leveraging the potential of in-memory cluster-computing Big Data frameworks. Still, massive datasets with a number of…

Machine Learning · Computer Science 2018-05-11 Luca Venturini , Elena Baralis , Paolo Garza

Mobile Big Data Analytics Using Deep Learning and Apache Spark

The proliferation of mobile devices, such as smartphones and Internet of Things (IoT) gadgets, results in the recent mobile big data (MBD) era. Collecting MBD is unprofitable unless suitable analytics and learning methods are utilized for…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-08-16 Mohammad Abu Alsheikh , Dusit Niyato , Shaowei Lin , Hwee-Pink Tan , Zhu Han

Scalable Exact Parent Sets Identification in Bayesian Networks Learning with Apache Spark

In Machine Learning, the parent set identification problem is to find a set of random variables that best explain selected variable given the data and some predefined scoring function. This problem is a critical component to structure…

Artificial Intelligence · Computer Science 2019-01-09 Subhadeep Karan , Jaroslaw Zola

Asynchronous Evolution of Deep Neural Network Architectures

Many evolutionary algorithms (EAs) take advantage of parallel evaluation of candidates. However, if evaluation times vary significantly, many worker nodes (i.e.,\ compute clients) are idle much of the time, waiting for the next generation…

Neural and Evolutionary Computing · Computer Science 2024-01-02 Jason Liang , Hormoz Shahrzad , Risto Miikkulainen

A Data and Model-Parallel, Distributed and Scalable Framework for Training of Deep Networks in Apache Spark

Training deep networks is expensive and time-consuming with the training period increasing with data size and growth in model parameters. In this paper, we provide a framework for distributed training of deep networks over a cluster of CPUs…

Machine Learning · Statistics 2017-08-22 Disha Shrivastava , Santanu Chaudhury , Dr. Jayadeva

Imbalanced Big Data Oversampling: Taxonomy, Algorithms, Software, Guidelines and Future Directions

Learning from imbalanced data is among the most challenging areas in contemporary machine learning. This becomes even more difficult when considered the context of big data that calls for dedicated architectures capable of high-performance…

Machine Learning · Computer Science 2022-11-16 William C. Sleeman , Bartosz Krawczyk

Towards Interactive, Adaptive and Result-aware Big Data Analytics

As data volumes grow across applications, analytics of large amounts of data is becoming increasingly important. Big data processing frameworks such as Apache Hadoop, Apache AsterixDB, and Apache Spark have been built to meet this demand. A…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-12-15 Avinash Kumar

Improving Intelligence of Evolutionary Algorithms Using Experience Share and Replay

We propose PESA, a novel approach combining Particle Swarm Optimisation (PSO), Evolution Strategy (ES), and Simulated Annealing (SA) in a hybrid Algorithm, inspired from reinforcement learning. PESA hybridizes the three algorithms by…

Neural and Evolutionary Computing · Computer Science 2020-09-21 Majdi I. Radaideh , Koroush Shirvan