Related papers: Data Integration with High Dimensionality

Data integration in high dimension with multiple quantiles

This article deals with the analysis of high dimensional data that come from multiple sources (experiments) and thus have different possibly correlated responses, but share the same set of predictors. The measurements of the predictors may…

Methodology · Statistics 2020-07-01 Guorong Dai , Ursula U. Müller , Raymond J. Carroll

Deep Learning Through the Lens of Example Difficulty

Existing work on understanding deep learning often employs measures that compress all data-dependent information into a few numbers. In this work, we adopt a perspective based on the role of individual examples. We introduce a measure of…

Machine Learning · Computer Science 2021-06-21 Robert J. N. Baldock , Hartmut Maennel , Behnam Neyshabur

Simultaneous estimation of normal means with side information

The integrative analysis of multiple datasets is an important strategy in data analysis. It is increasingly popular in genomics, which enjoys a wealth of publicly available datasets that can be compared, contrasted, and combined in order to…

Methodology · Statistics 2019-11-20 Sihai Dave Zhao

Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities

New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include a myriad of properties describing genome, epigenome, transcriptome, microbiome,…

Quantitative Methods · Quantitative Biology 2018-10-22 Marinka Zitnik , Francis Nguyen , Bo Wang , Jure Leskovec , Anna Goldenberg , Michael M. Hoffman

A Paradigmatic Regression Algorithm for Gene Selection Problems

Motivation: Gene selection has become a common task in most gene expression studies. The objective of such research is often to identify the smallest possible set of genes that can still achieve good predictive performance. The problem of…

Methodology · Statistics 2015-11-25 Stéphane Guerrier , Nabil Mili , Roberto Molinari , Samuel Orso , Marco Avella-Medina , Yanyuan Ma

Improving prediction models by incorporating external data with weights based on similarity

In clinical settings, we often face the challenge of building prediction models based on small observational data sets. For example, such a data set might be from a medical center in a multi-center study. Differences between centers might…

Methodology · Statistics 2024-05-29 Max Behrens , Maryam Farhadizadeh , Angelika Rohde , Alexander Rühle , Nils H. Nicolay , Harald Binder , Daniela Zöller

Computational Approaches for Disease Gene Identification

Identifying disease genes from human genome is an important and fundamental problem in biomedical research. Despite many publications of machine learning methods applied to discover new disease genes, it still remains a challenge because of…

Quantitative Methods · Quantitative Biology 2017-05-23 Peng Yang

Bidirectional Inference Networks: A Class of Deep Bayesian Networks for Health Profiling

We consider the problem of inferring the values of an arbitrary set of variables (e.g., risk of diseases) given other observed variables (e.g., symptoms and diagnosed diseases) and high-dimensional signals (e.g., MRI images or EEG). This is…

Machine Learning · Statistics 2019-02-07 Hao Wang , Chengzhi Mao , Hao He , Mingmin Zhao , Tommi S. Jaakkola , Dina Katabi

Do We Really Even Need Data?

As artificial intelligence and machine learning tools become more accessible, and scientists face new obstacles to data collection (e.g. rising costs, declining survey response rates), researchers increasingly use predictions from…

Methodology · Statistics 2024-02-06 Kentaro Hoffman , Stephen Salerno , Awan Afiaz , Jeffrey T. Leek , Tyler H. McCormick

Asymptotic Inference for Constrained Regression

We consider statistical inference in high-dimensional regression problems under affine constraints on the parameter space. The theoretical study of this is motivated by the study of genetic determinants of diseases, such as diabetes, using…

Methodology · Statistics 2026-01-06 Madhav Sankaranarayanan , Yana Hrytsenko , Jerome I. Rotter , Tamar Sofer , Rajarshi Mukherjee

Measuring the effects of confounders in medical supervised classification problems: the Confounding Index (CI)

Over the years, there has been growing interest in using Machine Learning techniques for biomedical data processing. When tackling these tasks, one needs to bear in mind that biomedical data depends on a variety of characteristics, such as…

Machine Learning · Computer Science 2020-02-05 Elisa Ferrari , Alessandra Retico , Davide Bacciu

High-dimensional regression over disease subgroups

We consider high-dimensional regression over subgroups of observations. Our work is motivated by biomedical problems, where disease subtypes, for example, may differ with respect to underlying regression models, but sample sizes at the…

Applications · Statistics 2016-12-12 Frank Dondelinger , Sach Mukherjee , The Alzheimer's Disease Neuroimaging Initiative

A Decision-Theoretic Model for Using Scientific Data

Many Artificial Intelligence systems depend on the agent's updating its beliefs about the world on the basis of experience. Experiments constitute one type of experience, so scientific methodology offers a natural environment for examining…

Artificial Intelligence · Computer Science 2013-04-08 Harold P. Lehmann

Data Consistency Approach to Model Validation

In scientific inference problems, the underlying statistical modeling assumptions have a crucial impact on the end results. There exist, however, only a few automatic means for validating these fundamental modelling assumptions. The…

Methodology · Statistics 2019-05-21 Andreas Svensson , Dave Zachariah , Petre Stoica , Thomas B. Schön

Domain constraints improve risk prediction when outcome data is missing

Machine learning models are often trained to predict the outcome resulting from a human decision. For example, if a doctor decides to test a patient for disease, will the patient test positive? A challenge is that historical decision-making…

Machine Learning · Computer Science 2024-04-23 Sidhika Balachandar , Nikhil Garg , Emma Pierson

Posterior Dispersion Indices

Probabilistic modeling is cyclical: we specify a model, infer its posterior, and evaluate its performance. Evaluation drives the cycle, as we revise our model based on how it performs. This requires a metric. Traditionally, predictive…

Machine Learning · Statistics 2016-05-25 Alp Kucukelbir , David M. Blei

Consistent and Flexible Selectivity Estimation for High-Dimensional Data

Selectivity estimation aims at estimating the number of database objects that satisfy a selection criterion. Answering this problem accurately and efficiently is essential to many applications, such as density estimation, outlier detection,…

Databases · Computer Science 2021-05-28 Yaoshu Wang , Chuan Xiao , Jianbin Qin , Rui Mao , Onizuka Makoto , Wei Wang , Rui Zhang , Yoshiharu Ishikawa

Causal Inference in medicine and in health policy, a summary

A data science task can be deemed as making sense of the data or testing a hypothesis about it. The conclusions inferred from data can greatly guide us to make informative decisions. Big data has enabled us to carry out countless prediction…

Machine Learning · Computer Science 2022-01-12 Wenhao Zhang , Ramin Ramezani , Arash Naeim

Fighting Noise with Noise: Causal Inference with Many Candidate Instruments

Instrumental variable methods provide useful tools for inferring causal effects in the presence of unmeasured confounding. To apply these methods with large-scale data sets, a major challenge is to find valid instruments from a possibly…

Methodology · Statistics 2024-09-24 Xinyi Zhang , Linbo Wang , Stanislav Volgushev , Dehan Kong

Nonparametric independence tests in high-dimensional settings, with applications to the genetics of complex disease

[PhD thesis of FCP.] Nowadays, genetics studies large amounts of very diverse variables. Mathematical statistics has evolved in parallel to its applications, with much recent interest high-dimensional settings. In the genetics of human…

Methodology · Statistics 2024-07-30 Fernando Castro-Prado