Related papers: Smart Data driven Decision Trees Ensemble Methodol…

Decision Tree Embedding by Leaf-Means

Decision trees and random forest remain highly competitive for classification on medium-sized, standard datasets due to their robustness, minimal preprocessing requirements, and interpretability. However, a single tree suffers from high…

Machine Learning · Statistics 2025-12-02 Cencheng Shen , Yuexiao Dong , Carey E. Priebe

Self-paced Ensemble for Highly Imbalanced Massive Data Classification

Many real-world applications reveal difficulties in learning classifiers from imbalanced data. The rising big data era has been witnessing more classification tasks with large-scale but extremely imbalance and low-quality datasets. Most of…

Machine Learning · Computer Science 2020-10-20 Zhining Liu , Wei Cao , Zhifeng Gao , Jiang Bian , Hechang Chen , Yi Chang , Tie-Yan Liu

On dynamic ensemble selection and data preprocessing for multi-class imbalance learning

Class-imbalance refers to classification problems in which many more instances are available for certain classes than for others. Such imbalanced datasets require special attention because traditional classifiers generally favor the…

Machine Learning · Statistics 2018-11-30 Rafael M. O. Cruz , Robert Sabourin , George D. C. Cavalcanti

Big Data Classification Using Augmented Decision Trees

We present an algorithm for classification tasks on big data. Experiments conducted as part of this study indicate that the algorithm can be as accurate as ensemble methods such as random forests or gradient boosted trees. Unlike ensemble…

Machine Learning · Statistics 2017-10-27 Rajiv Sambasivan , Sourish Das

Uncertainty Voting Ensemble for Imbalanced Deep Regression

Data imbalance is ubiquitous when applying machine learning to real-world problems, particularly regression problems. If training data are imbalanced, the learning is dominated by the densely covered regions of the target distribution and…

Machine Learning · Computer Science 2024-10-29 Yuchang Jiang , Vivien Sainte Fare Garnot , Konrad Schindler , Jan Dirk Wegner

Enabling Smart Data: Noise filtering in Big Data classification

In any knowledge discovery process the value of extracted knowledge is directly related to the quality of the data used. Big Data problems, generated by massive growth in the scale of data observed in recent years, also follow the same…

Databases · Computer Science 2017-07-31 Diego García-Gil , Julián Luengo , Salvador García , Francisco Herrera

The Random Forest Classifier in WEKA: Discussion and New Developments for Imbalanced Data

Data analysis and machine learning have become an integrative part of the modern scientific methodology, providing automated techniques to predict further information based on observations. One of these classification and regression…

Computer Vision and Pattern Recognition · Computer Science 2019-01-07 Mario Amrehn , Firas Mualla , Elli Angelopoulou , Stefan Steidl , Andreas Maier

Effective Class-Imbalance learning based on SMOTE and Convolutional Neural Networks

Imbalanced Data (ID) is a problem that deters Machine Learning (ML) models for achieving satisfactory results. ID is the occurrence of a situation where the quantity of the samples belonging to one class outnumbers that of the other by a…

Machine Learning · Computer Science 2022-10-14 Javad Hassannataj Joloudari , Abdolreza Marefat , Mohammad Ali Nematollahi , Solomon Sunday Oyelere , Sadiq Hussain

A Novel Hybrid Sampling Framework for Imbalanced Learning

Class imbalance is a frequently occurring scenario in classification tasks. Learning from imbalanced data poses a major challenge, which has instigated a lot of research in this area. Data preprocessing using sampling techniques is a…

Machine Learning · Computer Science 2022-08-23 Asif Newaz , Farhan Shahriyar Haq

Imbalanced Data Stream Classification using Dynamic Ensemble Selection

Modern streaming data categorization faces significant challenges from concept drift and class imbalanced data. This negatively impacts the output of the classifier, leading to improper classification. Furthermore, other factors such as the…

Machine Learning · Computer Science 2023-09-29 Priya. S , Haribharathi Sivakumar , Vijay Arvind. R

Improving fraud prediction with incremental data balancing technique for massive data streams

The performance of classification algorithms with a massive and highly imbalanced data stream depends upon efficient balancing strategy. Some techniques of balancing strategy have been applied in the past with Batch data to resolve the…

Machine Learning · Computer Science 2019-10-22 Rafiq Ahmed Mohammed , Kok-Wai Wong , Mohd Fairuz Shiratuddin , Xuequn Wang

Imbalanced data preprocessing techniques utilizing local data characteristics

Data imbalance, that is the disproportion between the number of training observations coming from different classes, remains one of the most significant challenges affecting contemporary machine learning. The negative impact of data…

Machine Learning · Computer Science 2021-11-30 Michał Koziarski

ICPRAI 2018 SI: On dynamic ensemble selection and data preprocessing for multi-class imbalance learning

Class-imbalance refers to classification problems in which many more instances are available for certain classes than for others. Such imbalanced datasets require special attention because traditional classifiers generally favor the…

Machine Learning · Computer Science 2018-11-30 Rafael M. O. Cruz , Mariana A. Souza , Robert Sabourin , George D. C. Cavalcanti

Deep Learning Meets Oversampling: A Learning Framework to Handle Imbalanced Classification

Despite extensive research spanning several decades, class imbalance is still considered a profound difficulty for both machine learning and deep learning models. While data oversampling is the foremost technique to address this issue,…

Machine Learning · Computer Science 2025-02-12 Sukumar Kishanthan , Asela Hevapathige

GRANDE: Gradient-Based Decision Tree Ensembles for Tabular Data

Despite the success of deep learning for text and image data, tree-based ensemble models are still state-of-the-art for machine learning with heterogeneous tabular data. However, there is a significant need for tabular-specific…

Machine Learning · Computer Science 2024-03-13 Sascha Marton , Stefan Lüdtke , Christian Bartelt , Heiner Stuckenschmidt

Data Balancing Strategies: A Systematic Survey of Resampling and Augmentation Methods

Imbalanced datasets, where one class significantly outnumbers others, remain a persistent challenge in machine learning, often biasing predictions toward the majority class and degrading classifier performance. This paper provides a…

Machine Learning · Statistics 2026-04-30 Behnam Yousefimehr , Mehdi Ghatee , Javad Fazli , Shervin Ghaffari , Zahra Rafei , Mohammad Amin Seifi , Sajed Tavakoli , Abolfazl Nikahd , Mahdi Razi Gandomani , Alireza Orouji , Ramtin Mahmoudi Kashani , Sarina Heshmati , Negin Sadat Mousavi

ODTE -- An ensemble of multi-class SVM-based oblique decision trees

We propose ODTE, a new ensemble that uses oblique decision trees as base classifiers. Additionally, we introduce STree, the base algorithm for growing oblique decision trees, which leverages support vector machines to define hyperplanes…

Machine Learning · Computer Science 2025-03-18 Ricardo Montañana , José A. Gámez , José M. Puerta

A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation

Class imbalance (CI) in classification problems arises when the number of observations belonging to one class is lower than the other. Ensemble learning combines multiple models to obtain a robust model and has been prominently used with…

Machine Learning · Computer Science 2023-11-28 Azal Ahmad Khan , Omkar Chaudhari , Rohitash Chandra

Balanced Split: A new train-test data splitting strategy for imbalanced datasets

Classification data sets with skewed class proportions are called imbalanced. Class imbalance is a problem since most machine learning classification algorithms are built with an assumption of equal representation of all classes in the…

Machine Learning · Computer Science 2022-12-22 Azal Ahmad Khan

Classification of Imbalanced Credit scoring data sets Based on Ensemble Method with the Weighted-Hybrid-Sampling

In the era of big data, the utilization of credit-scoring models to determine the credit risk of applicants accurately becomes a trend in the future. The conventional machine learning on credit scoring data sets tends to have poor…

Machine Learning · Statistics 2021-02-10 Xiaofan Liua , Zuoquan Zhanga , Di Wanga