Related papers: PRESISTANT: Learning based assistant for data pre-…

PASTA: Pretrained Action-State Transformer Agents

Self-supervised learning has brought about a revolutionary paradigm shift in various computing domains, including NLP, vision, and biology. Recent approaches involve pre-training transformer models on vast amounts of unlabeled data, serving…

Artificial Intelligence · Computer Science 2023-12-05 Raphael Boige , Yannis Flet-Berliac , Arthur Flajolet , Guillaume Richard , Thomas Pierrot

AISSISTANT: Human-AI Collaborative Review and Perspective Research Workflows in Data Science

High-quality scientific review and perspective papers require substantial time and effort, limiting researchers' ability to synthesize emerging knowledge. While Large Language Models (LLMs) leverage AI Scientists for scientific workflows,…

Artificial Intelligence · Computer Science 2026-03-03 Sasi Kiran Gaddipati , Farhana Keya , Gollam Rabby , Sören Auer

DataAssist: A Machine Learning Approach to Data Cleaning and Preparation

Current automated machine learning (ML) tools are model-centric, focusing on model selection and parameter optimization. However, the majority of the time in data analysis is devoted to data cleaning and wrangling, for which limited tools…

Machine Learning · Computer Science 2023-07-18 Kartikay Goyle , Quin Xie , Vakul Goyle

Data Agent: Learning to Select Data via End-to-End Dynamic Optimization

Dynamic Data selection aims to accelerate training by prioritizing informative samples during online training. However, existing methods typically rely on task-specific handcrafted metrics or static/snapshot-based criteria to estimate…

Machine Learning · Computer Science 2026-05-14 Suorong Yang , Fangjian Su , Hai Gan , Ziqi Ye , Jie Li , Baile Xu , Furao Shen , Soujanya Poria

Bayesian Decision Making around Experts

Complex learning agents are increasingly deployed alongside existing experts, such as human operators or previously trained agents. However, it remains unclear how should learners optimally incorporate certain forms of expert data, which…

Machine Learning · Computer Science 2025-10-10 Daniel Jarne Ornia , Joel Dyer , Nicholas Bishop , Anisoara Calinescu , Michael Wooldridge

Leveraging Predictive Models for Adaptive Sampling of Spatiotemporal Fluid Processes

Persistent monitoring of a spatiotemporal fluid process requires data sampling and predictive modeling of the process being monitored. In this paper we present PASST algorithm: Predictive-model based Adaptive Sampling of a Spatio-Temporal…

Robotics · Computer Science 2023-04-04 Sandeep Manjanna , Tom Z. Jiahao , M. Ani Hsieh

Extending the Hint Factory for the assistance dilemma: A novel, data-driven HelpNeed Predictor for proactive problem-solving help

Determining when and whether to provide personalized support is a well-known challenge called the assistance dilemma. A core problem in solving the assistance dilemma is the need to discover when students are unproductive so that the tutor…

Artificial Intelligence · Computer Science 2021-06-16 Mehak Maniktala , Christa Cody , Amy Isvik , Nicholas Lytle , Min Chi , Tiffany Barnes

Improving Expert Predictions with Conformal Prediction

Automated decision support systems promise to help human experts solve multiclass classification tasks more efficiently and accurately. However, existing systems typically require experts to understand when to cede agency to the system or…

Machine Learning · Computer Science 2023-07-03 Eleni Straitouri , Lequn Wang , Nastaran Okati , Manuel Gomez Rodriguez

Empirical Evaluations of Preprocessing Parameters' Impact on Predictive Coding's Effectiveness

Predictive coding, once used in only a small fraction of legal and business matters, is now widely deployed to quickly cull through increasingly vast amounts of data and reduce the need for costly and inefficient human document review.…

Information Retrieval · Computer Science 2019-04-04 Rishi Chhatwal , Nathaniel Huber-Fliflet , Robert Keeling , Jianping Zhang , Haozhen Zhao

Automated Image Data Preprocessing with Deep Reinforcement Learning

Data preparation, i.e. the process of transforming raw data into a format that can be used for training effective machine learning models, is a tedious and time-consuming task. For image data, preprocessing typically involves a sequence of…

Computer Vision and Pattern Recognition · Computer Science 2021-04-30 Tran Ngoc Minh , Mathieu Sinn , Hoang Thanh Lam , Martin Wistuba

Impact of Data Processing on Fairness in Supervised Learning

We study the impact of pre and post processing for reducing discrimination in data-driven decision makers. We first analyze the fundamental trade-off between fairness and accuracy in a pre-processing approach, and propose a design for a…

Machine Learning · Computer Science 2021-02-04 Sajad Khodadadian , AmirEmad Ghassami , Negar Kiyavash

PrISM-Observer: Intervention Agent to Help Users Perform Everyday Procedures Sensed using a Smartwatch

We routinely perform procedures (such as cooking) that include a set of atomic steps. Often, inadvertent omission or misordering of a single step can lead to serious consequences, especially for those experiencing cognitive challenges such…

Human-Computer Interaction · Computer Science 2024-07-25 Riku Arakawa , Hiromu Yakura , Mayank Goel

Probabilistic Active Meta-Learning

Data-efficient learning algorithms are essential in many practical applications where data collection is expensive, e.g., in robotics due to the wear and tear. To address this problem, meta-learning algorithms use prior experience about…

Machine Learning · Computer Science 2020-10-26 Jean Kaddour , Steindór Sæmundsson , Marc Peter Deisenroth

Stuck? No worries!: Task-aware Command Recommendation and Proactive Help for Analysts

Data analytics software applications have become an integral part of the decision-making process of analysts. Users of such a software face challenges due to insufficient product and domain knowledge, and find themselves in need of help. To…

Human-Computer Interaction · Computer Science 2019-06-24 Aadhavan M. Nambhi , Bhanu Prakash Reddy , Aarsh Prakash Agarwal , Gaurav Verma , Harvineet Singh , Iftikhar Ahamath Burhanuddin

A Review of Meta-level Learning in the Context of Multi-component, Multi-level Evolving Prediction Systems

The exponential growth of volume, variety and velocity of data is raising the need for investigations of automated or semi-automated ways to extract useful patterns from the data. It requires deep expert knowledge and extensive…

Machine Learning · Computer Science 2020-07-22 Abbas Raza Ali , Marcin Budka , Bogdan Gabrys

Supporting Better Insights of Data Science Pipelines with Fine-grained Provenance

Successful data-driven science requires complex data engineering pipelines to clean, transform, and alter data in preparation for machine learning, and robust results can only be achieved when each step in the pipeline can be justified, and…

Databases · Computer Science 2024-04-08 Adriane Chapman , Luca Lauro , Paolo Missier , Riccardo Torlone

Statistical Inference After Adaptive Sampling for Longitudinal Data

Online reinforcement learning and other adaptive sampling algorithms are increasingly used in digital intervention experiments to optimize treatment delivery for users over time. In this work, we focus on longitudinal user data collected by…

Machine Learning · Computer Science 2023-04-20 Kelly W. Zhang , Lucas Janson , Susan A. Murphy

Interpretable Network-assisted Random Forest+

Machine learning algorithms often assume that training samples are independent. When data points are connected by a network, the induced dependency between samples is both a challenge, reducing effective sample size, and an opportunity to…

Machine Learning · Statistics 2025-09-22 Tiffany M. Tang , Elizaveta Levina , Ji Zhu

Predicting data value before collection: A coefficient for prioritizing sources under random distribution shift

Researchers often face choices between multiple data sources that differ in quality, cost, and representativeness. Which sources will most improve predictive performance? We study this data prioritization problem under a random distribution…

Methodology · Statistics 2025-12-16 Ivy Zhang , Dominik Rothenhäusler

Process-BERT: A Framework for Representation Learning on Educational Process Data

Educational process data, i.e., logs of detailed student activities in computerized or online learning platforms, has the potential to offer deep insights into how students learn. One can use process data for many downstream tasks such as…

Machine Learning · Computer Science 2022-04-29 Alexander Scarlatos , Christopher Brinton , Andrew Lan