English
Related papers

Related papers: balance -- a Python package for balancing biased d…

200 papers

Machine learning applications, especially in the fields of me\-di\-cine and social sciences, are slowly being subjected to increasing scrutiny. Similarly to sample size planning performed in clinical and social studies, lawmakers and…

Methodology · Statistics 2023-01-16 Antoni Klorek , Karol Roszak , Izabela Szczech , Dariusz Brzezinski

Imbalanced-learn is an open-source python toolbox aiming at providing a wide range of methods to cope with the problem of imbalanced dataset frequently encountered in machine learning and pattern recognition. The implemented…

Machine Learning · Computer Science 2016-09-22 Guillaume Lemaitre , Fernando Nogueira , Christos K. Aridas

Imbalanced data poses a significant challenge in classification as model performance is affected by insufficient learning from minority classes. Balancing methods are often used to address this problem. However, such techniques can lead to…

Machine Learning · Computer Science 2024-06-18 Adrian Stando , Mustafa Cavus , Przemysław Biecek

Averaging predictions from multiple competing inferential models frequently outperforms predictions from any single model, providing that models are optimally weighted to maximize predictive performance. This is particularly the case in…

Methodology · Statistics 2024-05-02 Nathaniel Haines , Conor Goold

Imbalanced problems can arise in different real-world situations, and to address this, certain strategies in the form of resampling or balancing algorithms are proposed. This issue has largely been studied in the context of classification,…

Machine Learning · Computer Science 2025-07-17 Juscimara G. Avelino , George D. C. Cavalcanti , Rafael M. O. Cruz

Dealing with biased data samples is a common task across many statistical fields. In survey sampling, bias often occurs due to unrepresentative samples. In causal studies with observational data, the treated versus untreated group…

Computation · Statistics 2019-07-29 Xiaojing Wang , Jingang Miao , Yunting Sun

Learning generalized models from biased data is an important undertaking toward fairness in deep learning. To address this issue, recent studies attempt to identify and leverage bias-conflicting samples free from spurious correlations…

Machine Learning · Computer Science 2024-11-04 Yeonsung Jung , Jaeyun Song , June Yong Yang , Jin-Hwa Kim , Sung-Yub Kim , Eunho Yang

High-resolution estimates of population health indicators are critical for precision public health. We propose a method for high-resolution estimation that fuses distinct data sources: an unbiased, low-resolution data source (e.g.…

Methodology · Statistics 2025-08-21 Amy Guan , Marissa Reitsma , Roshni Sahoo , Joshua Salomon , Stefan Wager

Meta-analysis is a data aggregation method that establishes an overall and objective level of evidence based on the results of several studies. It is necessary to maintain a high level of homogeneity in the aggregation of data collected…

Class-level evaluation can conceal substantial performance disparities across subconcepts within the same class, causing models that perform well on average to fail on specific subpopulations. Prior work has shown that common evaluation…

Machine Learning · Computer Science 2026-04-30 Taylor Maxson , Roberto Corizzo , Yaning Wu , Nathalie Japkowicz , Colin Bellinger

Covariate balancing is a popular technique for controlling confounding in observational studies. It finds weights for the treatment group which are close to uniform, but make the group's covariate means (approximately) equal to those of the…

Methodology · Statistics 2025-03-07 Shiva Kaul , Min-Gyu Kim

Surveys are commonly used to facilitate research in epidemiology, health, and the social and behavioral sciences. Often, these surveys are not simple random samples, and respondents are given weights reflecting their probability of…

Methodology · Statistics 2024-08-20 Adway S. Wadekar , Jerome P. Reiter

Random-effects meta-analyses of observational studies can produce biased estimates if the synthesized studies are subject to unmeasured confounding. We propose sensitivity analyses quantifying the extent to which unmeasured confounding of…

Methodology · Statistics 2017-10-10 Maya B. Mathur , Tyler J. VanderWeele

We propose a simple method by which to choose sample weights for problems with highly imbalanced or skewed traits. Rather than naively discretizing regression labels to find binned weights, we take a more principled approach -- we derive…

Machine Learning · Computer Science 2021-04-01 Daniel J. Wu , Avoy Datta

In population studies, it is standard to sample data via designs in which the population is divided into strata, with the different strata assigned different probabilities of inclusion. Although there have been some proposals for including…

Methodology · Statistics 2014-09-29 T. Kunihama , A. H. Herring , C. T. Halpern , D. B. Dunson

Causal inference starts with a simple idea: compare groups that differ by treatment, not much else. Traditionally, similar groups are constructed using only observed covariates; however, it remains a long-standing challenge to incorporate…

Methodology · Statistics 2025-11-21 Ying Jin , José Zubizarreta

Fairness has been identified as an important aspect of Machine Learning and Artificial Intelligence solutions for decision making. Recent literature offers a variety of approaches for debiasing, however many of them fall short when the data…

Machine Learning · Computer Science 2025-06-18 Ata Yalcin , Asli Umay Ozturk , Yigit Sever , Viktoria Pauw , Stephan Hachinger , Ismail Hakki Toroslu , Pinar Karagoz

Often in surveys, key items are subject to measurement errors. Given just the data, it can be difficult to determine the distribution of this error process, and hence to obtain accurate inferences that involve the error-prone variables. In…

Methodology · Statistics 2016-10-04 Tracy Schifeling , Jerome P. Reiter , Maria DeYoreo

Creating fair AI systems is a complex problem that involves the assessment of context-dependent bias concerns. Existing research and programming libraries express specific concerns as measures of bias that they aim to constrain or mitigate.…

Machine Learning · Computer Science 2024-05-30 Emmanouil Krasanakis , Symeon Papadopoulos

Data imbalance is common in production data, where controlled production settings require data to fall within a narrow range of variation and data are collected with quality assessment in mind, rather than data analytic insights. This…

Machine Learning · Statistics 2021-12-17 Rune D. Kjærsgaard , Manja G. Grønberg , Line K. H. Clemmensen
‹ Prev 1 2 3 10 Next ›