Related papers: Complaint-driven Training Data Debugging for Query…

Machine learning 2.0 : Engineering Data Driven AI Products

ML 2.0: In this paper, we propose a paradigm shift from the current practice of creating machine learning models - which requires months-long discovery, exploration and "feasibility report" generation, followed by re-engineering for…

Artificial Intelligence · Computer Science 2018-07-03 James Max Kanter , Benjamin Schreck , Kalyan Veeramachaneni

RAIN: Reinforced Hybrid Attention Inference Network for Motion Forecasting

Motion forecasting plays a significant role in various domains (e.g., autonomous driving, human-robot interaction), which aims to predict future motion sequences given a set of historical observations. However, the observed elements may be…

Computer Vision and Pattern Recognition · Computer Science 2021-08-04 Jiachen Li , Fan Yang , Hengbo Ma , Srikanth Malla , Masayoshi Tomizuka , Chiho Choi

Towards Better De-raining Generalization via Rainy Characteristics Memorization and Replay

Current image de-raining methods primarily learn from a limited dataset, leading to inadequate performance in varied real-world rainy conditions. To tackle this, we introduce a new framework that enables networks to progressively expand…

Computer Vision and Pattern Recognition · Computer Science 2025-06-04 Kunyu Wang , Xueyang Fu , Chengzhi Cao , Chengjie Ge , Wei Zhai , Zheng-Jun Zha

Enabling SQL-based Training Data Debugging for Federated Learning

How can we debug a logistical regression model in a federated learning setting when seeing the model behave unexpectedly (e.g., the model rejects all high-income customers' loan applications)? The SQL-based training data debugging framework…

Machine Learning · Computer Science 2021-08-27 Yejia Liu , Weiyuan Wu , Lampros Flokas , Jiannan Wang , Eugene Wu

REIN: A Comprehensive Benchmark Framework for Data Cleaning Methods in ML Pipelines

Nowadays, machine learning (ML) plays a vital role in many aspects of our daily life. In essence, building well-performing ML applications requires the provision of high-quality data throughout the entire life-cycle of such applications.…

Databases · Computer Science 2023-02-10 Mohamed Abdelaal , Christian Hammacher , Harald Schoening

Training and Serving Machine Learning Models at Scale

In recent years, Web services are becoming more and more intelligent (e.g., in understanding user preferences) thanks to the integration of components that rely on Machine Learning (ML). Before users can interact (inference phase) with an…

Software Engineering · Computer Science 2022-11-11 Luciano Baresi , Giovanni Quattrocchi

RAIN: Your Language Models Can Align Themselves without Finetuning

Large language models (LLMs) often demonstrate inconsistencies with human preferences. Previous research typically gathered human preference data and then aligned the pre-trained models using reinforcement learning or instruction tuning,…

Computation and Language · Computer Science 2023-10-10 Yuhui Li , Fangyun Wei , Jinjing Zhao , Chao Zhang , Hongyang Zhang

Towards General and Fast Video Derain via Knowledge Distillation

As a common natural weather condition, rain can obscure video frames and thus affect the performance of the visual system, so video derain receives a lot of attention. In natural environments, rain has a wide variety of streak types, which…

Computer Vision and Pattern Recognition · Computer Science 2023-08-11 Defang Cai , Pan Mu , Sixian Chan , Zhanpeng Shao , Cong Bai

Learn to Unlearn: A Survey on Machine Unlearning

Machine Learning (ML) models have been shown to potentially leak sensitive information, thus raising privacy concerns in ML-driven applications. This inspired recent research on removing the influence of specific data samples from a trained…

Machine Learning · Computer Science 2023-10-30 Youyang Qu , Xin Yuan , Ming Ding , Wei Ni , Thierry Rakotoarivelo , David Smith

Data Cleaning and Machine Learning: A Systematic Literature Review

Context: Machine Learning (ML) is integrated into a growing number of systems for various applications. Because the performance of an ML model is highly dependent on the quality of the data it has been trained on, there is a growing…

Machine Learning · Computer Science 2024-06-03 Pierre-Olivier Côté , Amin Nikanjam , Nafisa Ahmed , Dmytro Humeniuk , Foutse Khomh

Imputation of missing sub-hourly precipitation data in a large sensor network: a machine learning approach

Precipitation data collected at sub-hourly resolution represents specific challenges for missing data recovery by being largely stochastic in nature and highly unbalanced in the duration of rain vs non-rain. Here we present a two-step…

Machine Learning · Computer Science 2020-07-21 Benedict Delahaye Chivers , John Wallbank , Steven J. Cole , Ondrej Sebek , Simon Stanley , Matthew Fry , Georgios Leontidis

An Effective Data-Driven Approach for Localizing Deep Learning Faults

Deep Learning (DL) applications are being used to solve problems in critical domains (e.g., autonomous driving or medical diagnosis systems). Thus, developers need to debug their systems to ensure that the expected behavior is delivered.…

Software Engineering · Computer Science 2023-07-19 Mohammad Wardat , Breno Dantas Cruz , Wei Le , Hridesh Rajan

Frustrated with Replicating Claims of a Shared Model? A Solution

Machine Learning (ML) and Deep Learning (DL) innovations are being introduced at such a rapid pace that model owners and evaluators are hard-pressed analyzing and studying them. This is exacerbated by the complicated procedures for…

Machine Learning · Computer Science 2019-06-26 Abdul Dakkak , Cheng Li , Jinjun Xiong , Wen-Mei Hwu

Rethinking Real-world Image Deraining via An Unpaired Degradation-Conditioned Diffusion Model

Recent diffusion models have exhibited great potential in generative modeling tasks. Part of their success can be attributed to the ability of training stable on huge sets of paired synthetic data. However, adapting these models to…

Computer Vision and Pattern Recognition · Computer Science 2024-05-02 Yiyang Shen , Mingqiang Wei , Yongzhen Wang , Xueyang Fu , Jing Qin

From Snow to Rain: Evaluating Robustness, Calibration, and Complexity of Model-Based Robust Training

Robustness to natural corruptions remains a critical challenge for reliable deep learning, particularly in safety-sensitive domains. We study a family of model-based training approaches that leverage a learned nuisance variation model to…

Computer Vision and Pattern Recognition · Computer Science 2026-01-15 Josué Martínez-Martínez , Olivia Brown , Giselle Zeno , Pooya Khorrami , Rajmonda Caceres

Monitoring Machine Learning Models: Online Detection of Relevant Deviations

Machine learning models are essential tools in various domains, but their performance can degrade over time due to changes in data distribution or other factors. On one hand, detecting and addressing such degradations is crucial for…

Machine Learning · Computer Science 2023-09-28 Florian Heinrichs

Towards Reliable Testing of Machine Unlearning

Machine learning components are now central to AI-infused software systems, from recommendations and code assistants to clinical decision support. As regulations and governance frameworks increasingly require deleting sensitive data from…

Machine Learning · Computer Science 2026-04-21 Anna Mazhar , Sainyam Galhotra

Data Origin Inference in Machine Learning

It is a growing direction to utilize unintended memorization in ML models to benefit real-world applications, with recent efforts like user auditing, dataset ownership inference and forgotten data measurement. Standing on the point of ML…

Machine Learning · Computer Science 2023-01-31 Mingxue Xu , Xiang-Yang Li

FLAIR: Feedback Learning for Adaptive Information Retrieval

Recent advances in Large Language Models (LLMs) have driven the adoption of copilots in complex technical scenarios, underscoring the growing need for specialized information retrieval solutions. In this paper, we introduce FLAIR, a…

Information Retrieval · Computer Science 2025-08-20 William Zhang , Yiwen Zhu , Yunlei Lu , Mathieu Demarne , Wenjing Wang , Kai Deng , Nutan Sahoo , Katherine Lin , Miso Cilimdzic , Subru Krishnan

Deriva-ML: A Continuous FAIRness Approach to Reproducible Machine Learning Models

Increasingly, artificial intelligence (AI) and machine learning (ML) are used in eScience applications [9]. While these approaches have great potential, the literature has shown that ML-based approaches frequently suffer from results that…

Machine Learning · Computer Science 2024-07-03 Zhiwei Li , Carl Kesselman , Mike D'Arch , Michael Pazzani , Benjamin Yizing Xu