Related papers: Learning from data with structured missingness

Greedy structure learning from data that contain systematic missing values

Learning from data that contain missing values represents a common phenomenon in many domains. Relatively few Bayesian Network structure learning algorithms account for missing data, and those that do tend to rely on standard approaches…

Machine Learning · Computer Science 2022-05-23 Yang Liu , Anthony C. Constantinou

What Can Knowledge Bring to Machine Learning? -- A Survey of Low-shot Learning for Structured Data

Supervised machine learning has several drawbacks that make it difficult to use in many situations. Drawbacks include: heavy reliance on massive training data, limited generalizability and poor expressiveness of high-level semantics.…

Machine Learning · Computer Science 2021-06-14 Yang Hu , Adriane Chapman , Guihua Wen , Dame Wendy Hall

Embeddings and Representation Learning for Structured Data

Performing machine learning on structured data is complicated by the fact that such data does not have vectorial form. Therefore, multiple approaches have emerged to construct vectorial representations of structured data, from kernel and…

Machine Learning · Computer Science 2019-05-16 Benjamin Paaßen , Claudio Gallicchio , Alessio Micheli , Alessandro Sperduti

Localized Structured Prediction

Key to structured prediction is exploiting the problem structure to simplify the learning process. A major challenge arises when data exhibit a local structure (e.g., are made by "parts") that can be leveraged to better approximate the…

Machine Learning · Statistics 2019-06-03 Carlo Ciliberto , Francis Bach , Alessandro Rudi

Learning with Hidden Factorial Structure

Statistical learning in high-dimensional spaces is challenging without a strong underlying data structure. Recent advances with foundational models suggest that text and image data contain such hidden structures, which help mitigate the…

Machine Learning · Statistics 2025-02-04 Charles Arnal , Clement Berenfeld , Simon Rosenberg , Vivien Cabannes

A systematic approach to identify and evaluate missing data patterns and mechanisms in multivariate educational, social, and behavioral research

Methods for addressing missing data have become much more accessible to applied researchers. However, little guidance exists to help researchers systematically identify plausible missing data mechanisms in order to ensure that these methods…

Applications · Statistics 2020-07-29 Adam Davey , Ting Dai

Robustness to Missing Features using Hierarchical Clustering with Split Neural Networks

The problem of missing data has been persistent for a long time and poses a major obstacle in machine learning and statistical data analysis. Past works in this field have tried using various data imputation techniques to fill in the…

Machine Learning · Computer Science 2020-11-20 Rishab Khincha , Utkarsh Sarawgi , Wazeer Zulfikar , Pattie Maes

Impact of Missing Values in Machine Learning: A Comprehensive Analysis

Machine learning (ML) has become a ubiquitous tool across various domains of data mining and big data analysis. The efficacy of ML models depends heavily on high-quality datasets, which are often complicated by the presence of missing…

Machine Learning · Computer Science 2024-10-14 Abu Fuad Ahmad , Md Shohel Sayeed , Khaznah Alshammari , Istiaque Ahmed

Prediction Models That Learn to Avoid Missing Values

Handling missing values at test time is challenging for machine learning models, especially when aiming for both high accuracy and interpretability. Established approaches often add bias through imputation or excessive model complexity via…

Machine Learning · Computer Science 2025-05-07 Lena Stempfle , Anton Matsson , Newton Mwai , Fredrik D. Johansson

To Measure What Isn't There -- Visual Exploration of Missingness Structures Using Quality Metrics

This paper contributes a set of quality metrics for identification and visual analysis of structured missingness in high-dimensional data. Missing values in data are a frequent challenge in most data generating domains and may cause a range…

Graphics · Computer Science 2025-05-30 Sara Johansson Fernstad , Sarah Alsufyani , Silvia Del Din , Alison Yarnall , Lynn Rochester

Text Data Integration

Data comes in many forms. From a shallow perspective, they can be viewed as being either in structured (e.g., as a relation, as key-value pairs) or unstructured (e.g., text, image) formats. So far, machines have been fairly good at…

Computation and Language · Computer Science 2026-03-31 Md Ataur Rahman , Dimitris Sacharidis , Oscar Romero , Sergi Nadal

Non-IID data and Continual Learning processes in Federated Learning: A long road ahead

Federated Learning is a novel framework that allows multiple devices or institutions to train a machine learning model collaboratively while preserving their data private. This decentralized approach is prone to suffer the consequences of…

Machine Learning · Computer Science 2021-11-29 Marcos F. Criado , Fernando E. Casado , Roberto Iglesias , Carlos V. Regueiro , Senén Barro

A Reflection on Learning from Data: Epistemology Issues and Limitations

Although learning from data is effective and has achieved significant milestones, it has many challenges and limitations. Learning from data starts from observations and then proceeds to broader generalizations. This framework is…

Machine Learning · Computer Science 2021-07-29 Ahmad Hammoudeh , Sara Tedmori , Nadim Obeid

Machine-Learning Mathematical Structures

We review, for a general audience, a variety of recent experiments on extracting structure from machine-learning mathematical data that have been compiled over the years. Focusing on supervised machine-learning on labeled data from…

Machine Learning · Computer Science 2021-04-09 Yang-Hui He

How to Do Machine Learning with Small Data? -- A Review from an Industrial Perspective

Artificial intelligence experienced a technological breakthrough in science, industry, and everyday life in the recent few decades. The advancements can be credited to the ever-increasing availability and miniaturization of computational…

Machine Learning · Computer Science 2023-11-14 Ivan Kraljevski , Yong Chul Ju , Dmitrij Ivanov , Constanze Tschöpe , Matthias Wolff

Audacity of huge: overcoming challenges of data scarcity and data quality for machine learning in computational materials discovery

Machine learning (ML)-accelerated discovery requires large amounts of high-fidelity data to reveal predictive structure-property relationships. For many properties of interest in materials discovery, the challenging nature and high cost of…

Chemical Physics · Physics 2021-11-04 Aditya Nandy , Chenru Duan , Heather J. Kulik

Pitfalls in Machine Learning Research: Reexamining the Development Cycle

Machine learning has the potential to fuel further advances in data science, but it is greatly hindered by an ad hoc design process, poor data hygiene, and a lack of statistical rigor in model evaluation. Recently, these issues have begun…

Machine Learning · Computer Science 2021-08-19 Stella Biderman , Walter J. Scheirer

Fairness and Missing Values

The causes underlying unfair decision making are complex, being internalised in different ways by decision makers, other actors dealing with data and models, and ultimately by the individuals being affected by these decisions. One frequent…

Machine Learning · Computer Science 2019-05-31 Fernando Martínez-Plumed , Cèsar Ferri , David Nieves , José Hernández-Orallo

Effective Learning of Probabilistic Models for Clinical Predictions from Longitudinal Data

With the expeditious advancement of information technologies, health-related data presented unprecedented potentials for medical and health discoveries but at the same time significant challenges for machine learning techniques both in…

Machine Learning · Computer Science 2018-11-05 Shuo Yang

Robust Learning from Untrusted Sources

Modern machine learning methods often require more data for training than a single expert can provide. Therefore, it has become a standard procedure to collect data from external sources, e.g. via crowdsourcing. Unfortunately, the quality…

Machine Learning · Computer Science 2019-05-20 Nikola Konstantinov , Christoph Lampert