Related papers: balance -- a Python package for balancing biased d…

confidence-planner: Easy-to-Use Prediction Confidence Estimation and Sample Size Planning

Machine learning applications, especially in the fields of me\-di\-cine and social sciences, are slowly being subjected to increasing scrutiny. Similarly to sample size planning performed in clinical and social studies, lawmakers and…

Methodology · Statistics 2023-01-16 Antoni Klorek , Karol Roszak , Izabela Szczech , Dariusz Brzezinski

Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning

Imbalanced-learn is an open-source python toolbox aiming at providing a wide range of methods to cope with the problem of imbalanced dataset frequently encountered in machine learning and pattern recognition. The implemented…

Machine Learning · Computer Science 2016-09-22 Guillaume Lemaitre , Fernando Nogueira , Christos K. Aridas

The Effect of Balancing Methods on Model Behavior in Imbalanced Classification Problems

Imbalanced data poses a significant challenge in classification as model performance is affected by insufficient learning from minority classes. Balancing methods are often used to address this problem. However, such techniques can lead to…

Machine Learning · Computer Science 2024-06-18 Adrian Stando , Mustafa Cavus , Przemysław Biecek

BayesBlend: Easy Model Blending using Pseudo-Bayesian Model Averaging, Stacking and Hierarchical Stacking in Python

Averaging predictions from multiple competing inferential models frequently outperforms predictions from any single model, providing that models are optimally weighted to maximize predictive performance. This is particularly the case in…

Methodology · Statistics 2024-05-02 Nathaniel Haines , Conor Goold

Resampling strategies for imbalanced regression: a survey and empirical analysis

Imbalanced problems can arise in different real-world situations, and to address this, certain strategies in the form of resampling or balancing algorithms are proposed. This issue has largely been studied in the context of classification,…

Machine Learning · Computer Science 2025-07-17 Juscimara G. Avelino , George D. C. Cavalcanti , Rafael M. O. Cruz

A Python Library For Empirical Calibration

Dealing with biased data samples is a common task across many statistical fields. In survey sampling, bias often occurs due to unrepresentative samples. In causal studies with observational data, the treated versus untreated group…

Computation · Statistics 2019-07-29 Xiaojing Wang , Jingang Miao , Yunting Sun

A Simple Remedy for Dataset Bias via Self-Influence: A Mislabeled Sample Perspective

Learning generalized models from biased data is an important undertaking toward fairness in deep learning. To address this issue, recent studies attempt to identify and leverage bias-conflicting samples free from spurious correlations…

Machine Learning · Computer Science 2024-11-04 Yeonsung Jung , Jaeyun Song , June Yong Yang , Jin-Hwa Kim , Sung-Yub Kim , Eunho Yang

Data Fusion for High-Resolution Estimation

High-resolution estimates of population health indicators are critical for precision public health. We propose a method for high-resolution estimation that fuses distinct data sources: an unbiased, low-resolution data source (e.g.…

Methodology · Statistics 2025-08-21 Amy Guan , Marissa Reitsma , Roshni Sahoo , Joshua Salomon , Stefan Wager

Meta-analysis parameters computation: a Python approach to facilitate the crossing of experimental conditions

Meta-analysis is a data aggregation method that establishes an overall and objective level of evidence based on the results of several studies. It is necessary to maintain a high level of homogeneity in the aggregation of data collected…

Methodology · Statistics 2020-07-16 Flavien Quijoux , Charles Truong , Aliénor Vienne-Jumeau , Laurent Oudre , François BERTIN-HUGAULT , Philippe ZAWIEJA , Marie LEFEVRE , Pierre-Paul VIDAL , Damien RICARD

Correcting Performance Estimation Bias in Imbalanced Classification with Minority Subconcepts

Class-level evaluation can conceal substantial performance disparities across subconcepts within the same class, causing models that perform well on average to fail on specific subpopulations. Prior work has shown that common evaluation…

Machine Learning · Computer Science 2026-04-30 Taylor Maxson , Roberto Corizzo , Yaning Wu , Nathalie Japkowicz , Colin Bellinger

How Balanced Should Causal Covariates Be?

Covariate balancing is a popular technique for controlling confounding in observational studies. It finds weights for the treatment group which are close to uniform, but make the group's covariate means (approximately) equal to those of the…

Methodology · Statistics 2025-03-07 Shiva Kaul , Min-Gyu Kim

Evaluating Binary Outcome Classifiers Estimated from Survey Data

Surveys are commonly used to facilitate research in epidemiology, health, and the social and behavioral sciences. Often, these surveys are not simple random samples, and respondents are given weights reflecting their probability of…

Methodology · Statistics 2024-08-20 Adway S. Wadekar , Jerome P. Reiter

Sensitivity Analysis for Unmeasured Confounding in Meta-Analyses

Random-effects meta-analyses of observational studies can produce biased estimates if the synthesized studies are subject to unmeasured confounding. We propose sensitivity analyses quantifying the extent to which unmeasured confounding of…

Methodology · Statistics 2017-10-10 Maya B. Mathur , Tyler J. VanderWeele

Continuous Weight Balancing

We propose a simple method by which to choose sample weights for problems with highly imbalanced or skewed traits. Rather than naively discretizing regression labels to find binned weights, we take a more principled approach -- we derive…

Machine Learning · Computer Science 2021-04-01 Daniel J. Wu , Avoy Datta

Nonparametric Bayes modeling with sample survey weights

In population studies, it is standard to sample data via designs in which the population is divided into strata, with the different strata assigned different probabilities of inclusion. Although there have been some proposals for including…

Methodology · Statistics 2014-09-29 T. Kunihama , A. H. Herring , C. T. Halpern , D. B. Dunson

Cross-Balancing for Data-Informed Design and Efficient Analysis of Observational Studies

Causal inference starts with a simple idea: compare groups that differ by treatment, not much else. Traditionally, similar groups are constructed using only observed covariates; however, it remains a long-standing challenge to incorporate…

Methodology · Statistics 2025-11-21 Ying Jin , José Zubizarreta

Fair for a few: Improving Fairness in Doubly Imbalanced Datasets

Fairness has been identified as an important aspect of Machine Learning and Artificial Intelligence solutions for decision making. Recent literature offers a variety of approaches for debiasing, however many of them fall short when the data…

Machine Learning · Computer Science 2025-06-18 Ata Yalcin , Asli Umay Ozturk , Yigit Sever , Viktoria Pauw , Stephan Hachinger , Ismail Hakki Toroslu , Pinar Karagoz

Data Fusion for Correcting Measurement Errors

Often in surveys, key items are subject to measurement errors. Given just the data, it can be difficult to determine the distribution of this error process, and hence to obtain accurate inferences that involve the error-prone variables. In…

Methodology · Statistics 2016-10-04 Tracy Schifeling , Jerome P. Reiter , Maria DeYoreo

Towards Standardizing AI Bias Exploration

Creating fair AI systems is a complex problem that involves the assessment of context-dependent bias concerns. Existing research and programming libraries express specific concerns as measures of bias that they aim to constrain or mitigate.…

Machine Learning · Computer Science 2024-05-30 Emmanouil Krasanakis , Symeon Papadopoulos

Sampling To Improve Predictions For Underrepresented Observations In Imbalanced Data

Data imbalance is common in production data, where controlled production settings require data to fall within a narrow range of variation and data are collected with quality assessment in mind, rather than data analytic insights. This…

Machine Learning · Statistics 2021-12-17 Rune D. Kjærsgaard , Manja G. Grønberg , Line K. H. Clemmensen