Related papers: ProbCD: enrichment analysis accounting for categor…

Random-set methods identify distinct aspects of the enrichment signal in gene-set analysis

A prespecified set of genes may be enriched, to varying degrees, for genes that have altered expression levels relative to two or more states of a cell. Knowing the enrichment of gene sets defined by functional categories, such as gene…

Applications · Statistics 2009-09-29 Michael A. Newton , Fernando A. Quintana , Johan A. den Boon , Srikumar Sengupta , Paul Ahlquist

Table Enrichment System for Machine Learning

Data scientists are constantly facing the problem of how to improve prediction accuracy with insufficient tabular data. We propose a table enrichment system that enriches a query table by adding external attributes (columns) from data lakes…

Information Retrieval · Computer Science 2022-04-19 Yuyang Dong , Masafumi Oyamada

SurvBoost: An R Package for High-Dimensional Variable Selection in the Stratified Proportional Hazards Model via Gradient Boosting

High-dimensional variable selection in the proportional hazards (PH) model has many successful applications in different areas. In practice, data may involve confounding variables that do not satisfy the PH assumption, in which case the…

Computation · Statistics 2018-03-22 Emily Morris , Kevin He , Yanming Li , Yi Li , Jian Kang

OncoEnrichR: cancer-dedicated gene set interpretation

Genome-scale screening experiments in cancer produce long lists of candidate genes that require extensive interpretation for biological insight and prioritization for follow-up studies. Interrogation of gene lists frequently represents a…

Genomics · Quantitative Biology 2022-09-29 Sigve Nakken , Sveinung Gundersen , Fabian L. M. Bernal , Dimitris Polychronopoulos , Eivind Hovig , Jørgen Wesche

The penetrance R package for Estimation of Age Specific Risk in Family-based Studies

Reliable tools and software for penetrance (age-specific risk among those who carry a genetic variant) estimation are critical to improving clinical decision making and risk assessment for hereditary syndromes. We introduce penetrance, an…

Computation · Statistics 2025-03-28 Nicolas Kubista , Danielle Braun , Giovanni Parmigiani

Robust Differential Abundance Test in Compositional Data

Differential abundance tests in compositional data are essential and fundamental tasks in various biomedical applications, such as single-cell, bulk RNA-seq, and microbiome data analysis. However, because of the compositional constraint and…

Methodology · Statistics 2022-04-14 Shulei Wang

BRcal: An R Package to Boldness-Recalibrate Probability Predictions

When probability predictions are too cautious for decision making, boldness-recalibration enables responsible emboldening while maintaining the probability of calibration required by the user. We formulate boldness-recalibration as a…

Methodology · Statistics 2025-07-21 Adeline P. Guthrie , Christopher T. Franck

Stochastic Package Queries in Probabilistic Databases

We provide methods for in-database support of decision making under uncertainty. Many important decision problems correspond to selecting a package (bag of tuples in a relational database) that jointly satisfy a set of constraints while…

Databases · Computer Science 2021-03-12 Matteo Brucato , Nishant Yadav , Azza Abouzied , Peter J. Haas , Alexandra Meliou

ProbPNN: Enhancing Deep Probabilistic Forecasting with Statistical Information

Probabilistic forecasts are essential for various downstream applications such as business development, traffic planning, and electrical grid balancing. Many of these probabilistic forecasts are performed on time series data that contain…

Machine Learning · Computer Science 2023-02-07 Benedikt Heidrich , Kaleb Phipps , Oliver Neumann , Marian Turowski , Ralf Mikut , Veit Hagenmeyer

Precision-Recall Curve (PRC) Classification Trees

The classification of imbalanced data has presented a significant challenge for most well-known classification algorithms that were often designed for data with relatively balanced class distributions. Nevertheless skewed class distribution…

Machine Learning · Statistics 2023-04-21 Jiaju Miao , Wei Zhu

An Odds Ratio Based Inference Engine

Expert systems applications that involve uncertain inference can be represented by a multidimensional contingency table. These tables offer a general approach to inferring with uncertain evidence, because they can embody any form of…

Artificial Intelligence · Computer Science 2013-04-15 David S. Vaughan , Bruce M. Perrin , Robert M. Yadrick , Peter D. Holden , Karl G. Kempf

Redundancy-aware unsupervised ranking based on game theory -- application to gene enrichment analysis

Gene set collections are a common ground to study the enrichment of genes for specific phenotypic traits. Gene set enrichment analysis aims to identify genes that are over-represented in gene sets collections and might be associated with a…

Genomics · Quantitative Biology 2022-07-26 Chiara Balestra , Carlo Maj , Emmanuel Mueller , Andreas Mayr

Robust and accurate data enrichment statistics via distribution function of sum of weights

Term enrichment analysis facilitates biological interpretation by assigning to experimentally/computationally obtained data annotation associated with terms from controlled vocabularies. This process usually involves obtaining statistical…

Quantitative Methods · Quantitative Biology 2011-10-25 Aleksandar Stojmirović , Yi-Kuo Yu

Conformal inference for cell type annotation with graph-structured constraints

Conformal inference is a method that provides prediction sets for machine learning models, operating independently of the underlying distributional assumptions and relying solely on the exchangeability of training and test data. Despite its…

Methodology · Statistics 2025-10-01 Daniela Corbetta , Livio Finos , Ludwig Geistlinger , Davide Risso

Statistical Network Analysis with Bergm

Recent advances in computational methods for intractable models have made network data increasingly amenable to statistical analysis. Exponential random graph models (ERGMs) emerged as one of the main families of models capable of capturing…

Computation · Statistics 2021-04-07 Alberto Caimo , Lampros Bouranis , Robert Krause , Nial Friel

On Constrained Open-World Probabilistic Databases

Increasing amounts of available data have led to a heightened need for representing large-scale probabilistic knowledge bases. One approach is to use a probabilistic database, a model with strong assumptions that allow for efficiently…

Artificial Intelligence · Computer Science 2019-04-04 Tal Friedman , Guy Van den Broeck

Compositional imprecise probability

Imprecise probability is concerned with uncertainty about which probability distributions to use. It has applications in robust statistics and machine learning. We look at programming language models for imprecise probability. Our…

Programming Languages · Computer Science 2024-10-31 Jack Liell-Cock , Sam Staton

Improving Uncertainty Calibration via Prior Augmented Data

Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators. However, they are often overconfident in their predictions, which leads to inaccurate and miscalibrated…

Machine Learning · Computer Science 2021-02-23 Jeffrey Willette , Juho Lee , Sung Ju Hwang

greed: An R Package for Model-Based Clustering by Greedy Maximization of the Integrated Classification Likelihood

The greed package implements the general and flexible framework of arXiv:2002.11577 for model-based clustering in the R language. Based on the direct maximization of the exact Integrated Classification Likelihood with respect to the…

Methodology · Statistics 2022-05-02 Etienne Côme , Nicolas Jouvin

abc: an R package for Approximate Bayesian Computation (ABC)

Many recent statistical applications involve inference under complex models, where it is computationally prohibitive to calculate likelihoods but possible to simulate data. Approximate Bayesian Computation (ABC) is devoted to these complex…

Populations and Evolution · Quantitative Biology 2011-06-15 Katalin Csilléry , Olivier François , Michael GB Blum