Related papers: RealPatch: A Statistical Matching Framework for Mo…

Model Patching: Closing the Subgroup Performance Gap with Data Augmentation

Classifiers in machine learning are often brittle when deployed. Particularly concerning are models with inconsistent performance on specific subgroups of a class, e.g., exhibiting disparities in skin cancer classification in the presence…

Machine Learning · Computer Science 2020-08-18 Karan Goel , Albert Gu , Yixuan Li , Christopher Ré

AutoPatch: Multi-Agent Framework for Patching Real-World CVE Vulnerabilities

Large Language Models (LLMs) have emerged as promising tools in software development, enabling automated code generation and analysis. However, their knowledge is limited to a fixed cutoff date, making them prone to generating code…

Cryptography and Security · Computer Science 2025-12-01 Minjae Seo , Wonwoo Choi , Myoungsung You , Seungwon Shin

SimMatch: Semi-supervised Learning with Similarity Matching

Learning with few labeled data has been a longstanding problem in the computer vision and machine learning research community. In this paper, we introduced a new semi-supervised learning framework, SimMatch, which simultaneously considers…

Computer Vision and Pattern Recognition · Computer Science 2022-03-18 Mingkai Zheng , Shan You , Lang Huang , Fei Wang , Chen Qian , Chang Xu

An analysis of over-sampling labeled data in semi-supervised learning with FixMatch

Most semi-supervised learning methods over-sample labeled data when constructing training mini-batches. This paper studies whether this common practice improves learning and how. We compare it to an alternative setting where each mini-batch…

Machine Learning · Computer Science 2022-04-11 Miquel Martí i Rabadán , Sebastian Bujwid , Alessandro Pieropan , Hossein Azizpour , Atsuto Maki

Active Data Sampling and Generation for Bias Remediation

Adequate sampling space coverage is the keystone to effectively train trustworthy Machine Learning models. Unfortunately, real data do carry several inherent risks due to the many potential biases they exhibit when gathered without a proper…

Machine Learning · Computer Science 2025-03-27 Antonio Maratea , Rita Perna

Imitation Learning Datasets: A Toolkit For Creating Datasets, Training Agents and Benchmarking

Imitation learning field requires expert data to train agents in a task. Most often, this learning approach suffers from the absence of available data, which results in techniques being tested on its dataset. Creating datasets is a…

Machine Learning · Computer Science 2024-03-04 Nathan Gavenski , Michael Luck , Odinaldo Rodrigues

Multisample Flow Matching: Straightening Flows with Minibatch Couplings

Simulation-free methods for training continuous-time generative models construct probability paths that go between noise distributions and individual data samples. Recent works, such as Flow Matching, derived paths that are optimal for each…

Machine Learning · Computer Science 2023-05-26 Aram-Alexandre Pooladian , Heli Ben-Hamu , Carles Domingo-Enrich , Brandon Amos , Yaron Lipman , Ricky T. Q. Chen

Provably Improving Generalization of Few-Shot Models with Synthetic Data

Few-shot image classification remains challenging due to the scarcity of labeled training examples. Augmenting them with synthetic data has emerged as a promising way to alleviate this issue, but models trained on synthetic samples often…

Machine Learning · Computer Science 2025-06-26 Lan-Cuong Nguyen , Quan Nguyen-Tri , Bang Tran Khanh , Dung D. Le , Long Tran-Thanh , Khoat Than

MetaMix: Improved Meta-Learning with Interpolation-based Consistency Regularization

Model-Agnostic Meta-Learning (MAML) and its variants are popular few-shot classification methods. They train an initializer across a variety of sampled learning tasks (also known as episodes) such that the initialized model can adapt…

Computer Vision and Pattern Recognition · Computer Science 2020-10-13 Yangbin Chen , Yun Ma , Tom Ko , Jianping Wang , Qing Li

Dense FixMatch: a simple semi-supervised learning method for pixel-wise prediction tasks

We propose Dense FixMatch, a simple method for online semi-supervised learning of dense and structured prediction tasks combining pseudo-labeling and consistency regularization via strong data augmentation. We enable the application of…

Computer Vision and Pattern Recognition · Computer Science 2022-10-19 Miquel Martí i Rabadán , Alessandro Pieropan , Hossein Azizpour , Atsuto Maki

SuperPatchMatch: an Algorithm for Robust Correspondences using Superpixel Patches

Superpixels have become very popular in many computer vision applications. Nevertheless, they remain underexploited since the superpixel decomposition may produce irregular and non stable segmentation results due to the dependency to the…

Computer Vision and Pattern Recognition · Computer Science 2025-09-26 Rémi Giraud , Vinh-Thong Ta , Aurélie Bugeau , Pierrick Coupé , Nicolas Papadakis

Narrowing the Complexity Gap in the Evaluation of Large Language Models

Evaluating Large Language Models (LLMs) with respect to real-world code complexity is essential. Otherwise, there is a risk of overestimating LLMs' programming abilities based on simplistic benchmarks, only to be disappointed when using…

Software Engineering · Computer Science 2026-02-24 Yang Chen , Shuyang Liu , Reyhaneh Jabbarvand

IPixMatch: Boost Semi-supervised Semantic Segmentation with Inter-Pixel Relation

The scarcity of labeled data in real-world scenarios is a critical bottleneck of deep learning's effectiveness. Semi-supervised semantic segmentation has been a typical solution to achieve a desirable tradeoff between annotation cost and…

Computer Vision and Pattern Recognition · Computer Science 2024-04-30 Kebin Wu , Wenbin Li , Xiaofei Xiao

A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction

Supervised neural approaches are hindered by their dependence on large, meticulously annotated datasets, a requirement that is particularly cumbersome for sequential tasks. The quality of annotations tends to deteriorate with the transition…

Computation and Language · Computer Science 2025-03-10 Carel van Niekerk , Christian Geishauser , Michael Heck , Shutong Feng , Hsien-chin Lin , Nurul Lubis , Benjamin Ruppik , Renato Vukovic , Milica Gašić

Data Augmentation by Pairing Samples for Images Classification

Data augmentation is a widely used technique in many machine learning tasks, such as image classification, to virtually enlarge the training dataset size and avoid overfitting. Traditional data augmentation techniques for image…

Machine Learning · Computer Science 2018-04-12 Hiroshi Inoue

FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness for Semi-Supervised Learning

Semi-Supervised Learning (SSL) has been an effective way to leverage abundant unlabeled data with extremely scarce labeled data. However, most SSL methods are commonly based on instance-wise consistency between different data…

Machine Learning · Computer Science 2023-10-26 Zhuo Huang , Li Shen , Jun Yu , Bo Han , Tongliang Liu

FairBatch: Batch Selection for Model Fairness

Training a fair machine learning model is essential to prevent demographic disparity. Existing techniques for improving model fairness require broad changes in either data preprocessing or model training, rendering themselves…

Machine Learning · Computer Science 2021-06-03 Yuji Roh , Kangwook Lee , Steven Euijong Whang , Changho Suh

Fix your Models by Fixing your Datasets

The quality of underlying training data is very crucial for building performant machine learning models with wider generalizabilty. However, current machine learning (ML) tools lack streamlined processes for improving the data quality. So,…

Machine Learning · Computer Science 2021-12-16 Atindriyo Sanyal , Vikram Chatterji , Nidhi Vyas , Ben Epstein , Nikita Demir , Anthony Corletti

Fill In The Gaps: Model Calibration and Generalization with Synthetic Data

As machine learning models continue to swiftly advance, calibrating their performance has become a major concern prior to practical and widespread implementation. Most existing calibration methods often negatively impact model accuracy due…

Computation and Language · Computer Science 2024-10-16 Yang Ba , Michelle V. Mancenido , Rong Pan

Impact of Leakage on Data Harmonization in Machine Learning Pipelines in Class Imbalance Across Sites

Machine learning (ML) models benefit from large datasets. Collecting data in biomedical domains is costly and challenging, hence, combining datasets has become a common practice. However, datasets obtained under different conditions could…

Machine Learning · Computer Science 2025-05-23 Nicolás Nieto , Simon B. Eickhoff , Christian Jung , Martin Reuter , Kersten Diers , Malte Kelm , Artur Lichtenberg , Federico Raimondo , Kaustubh R. Patil