Related papers: View-Driven Deduplication with Active Learning

A Pre-trained Data Deduplication Model based on Active Learning

In the era of big data, the issue of data quality has become increasingly prominent. One of the main challenges is the problem of duplicate data, which can arise from repeated entry or the merging of multiple data sources. These "dirty…

Machine Learning · Computer Science 2025-01-13 Haochen Shi , Xinyao Liu , Fengmao Lv , Hongtao Xue , Jie Hu , Shengdong Du , Tianrui Li

FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication

Recent dataset deduplication techniques have demonstrated that content-aware dataset pruning can dramatically reduce the cost of training Vision-Language Pretrained (VLP) models without significant performance losses compared to training on…

Computer Vision and Pattern Recognition · Computer Science 2024-04-26 Eric Slyman , Stefan Lee , Scott Cohen , Kushal Kafle

Intrinsic Self-Supervision for Data Quality Audits

Benchmark datasets in computer vision often contain off-topic images, near duplicates, and label errors, leading to inaccurate estimates of model performance. In this paper, we revisit the task of data cleaning and formalize it as either a…

Computer Vision and Pattern Recognition · Computer Science 2024-10-30 Fabian Gröger , Simone Lionetti , Philippe Gottfrois , Alvaro Gonzalez-Jimenez , Ludovic Amruthalingam , Labelling Consortium , Matthew Groh , Alexander A. Navarini , Marc Pouly

Combining Visual Analytics and Content Based Data Retrieval Technology for Efficient Data Analysis

One of the most useful techniques to help visual data analysis systems is interactive filtering (brushing). However, visualization techniques often suffer from overlap of graphical items and multiple attributes complexity, making visual…

Graphics · Computer Science 2015-07-07 Jose Rodrigues , Luciana Romani , Agma Traina , Caetano Traina

Active Learning with Multiple Views

Active learners alleviate the burden of labeling large amounts of data by detecting and asking the user to label only the most informative examples in the domain. We focus here on active learning for multi-view domains, in which there are…

Machine Learning · Computer Science 2011-10-06 C. A. Knoblock , S. Minton , I. Muslea

Active label cleaning for improved dataset quality under resource constraints

Imperfections in data annotation, known as label noise, are detrimental to the training of machine learning models and have an often-overlooked confounding effect on the assessment of model performance. Nevertheless, employing experts to…

Computer Vision and Pattern Recognition · Computer Science 2022-04-25 Melanie Bernhardt , Daniel C. Castro , Ryutaro Tanno , Anton Schwaighofer , Kerem C. Tezcan , Miguel Monteiro , Shruthi Bannur , Matthew Lungren , Aditya Nori , Ben Glocker , Javier Alvarez-Valle , Ozan Oktay

MultiVision: Designing Analytical Dashboards with Deep Learning Based Recommendation

We contribute a deep-learning-based method that assists in designing analytical dashboards for analyzing a data table. Given a data table, data workers usually need to experience a tedious and time-consuming process to select meaningful…

Human-Computer Interaction · Computer Science 2021-07-19 Aoyu Wu , Yun Wang , Mengyu Zhou , Xinyi He , Haidong Zhang , Huamin Qu , Dongmei Zhang

Toward a view-based data cleaning architecture

Big data analysis has become an active area of study with the growth of machine learning techniques. To properly analyze data, it is important to maintain high-quality data. Thus, research on data cleaning is also important. It is difficult…

Databases · Computer Science 2019-10-25 Toshiyuki Shimizu , Hiroki Omori , Masatoshi Yoshikawa

VideoPro: A Visual Analytics Approach for Interactive Video Programming

Constructing supervised machine learning models for real-world video analysis require substantial labeled data, which is costly to acquire due to scarce domain expertise and laborious manual inspection. While data programming shows promise…

Computer Vision and Pattern Recognition · Computer Science 2023-11-02 Jianben He , Xingbo Wang , Kam Kwai Wong , Xijie Huang , Changjian Chen , Zixin Chen , Fengjie Wang , Min Zhu , Huamin Qu

Many active learning and search approaches are intractable for large-scale industrial settings with billions of unlabeled examples. Existing approaches search globally for the optimal examples to label, scaling linearly or even…

Machine Learning · Computer Science 2021-07-23 Cody Coleman , Edward Chou , Julian Katz-Samuels , Sean Culatana , Peter Bailis , Alexander C. Berg , Robert Nowak , Roshan Sumbaly , Matei Zaharia , I. Zeki Yalniz

Active Learning for Automated Visual Inspection of Manufactured Products

Quality control is a key activity performed by manufacturing enterprises to ensure products meet quality standards and avoid potential damage to the brand's reputation. The decreased cost of sensors and connectivity enabled an increasing…

Machine Learning · Computer Science 2021-09-07 Elena Trajkova , Jože M. Rožanec , Paulien Dam , Blaž Fortuna , Dunja Mladenić

Parting with Illusions about Deep Active Learning

Active learning aims to reduce the high labeling cost involved in training machine learning models on large datasets by efficiently labeling only the most informative samples. Recently, deep active learning has shown success on various…

Computer Vision and Pattern Recognition · Computer Science 2019-12-12 Sudhanshu Mittal , Maxim Tatarchenko , Özgün Çiçek , Thomas Brox

Towards Computationally Feasible Deep Active Learning

Active learning (AL) is a prominent technique for reducing the annotation effort required for training machine learning models. Deep learning offers a solution for several essential obstacles to deploying AL in practice but introduces many…

Computation and Language · Computer Science 2022-05-10 Akim Tsvigun , Artem Shelmanov , Gleb Kuzmin , Leonid Sanochkin , Daniil Larionov , Gleb Gusev , Manvel Avetisian , Leonid Zhukov

Decoupling Data and Tooling in Interactive Visualization

Interactive data visualization is a major part of modern exploratory data analysis, with web-based technologies enabling a rich ecosystem of both specialized and general tools. However, current visualization tools often lack support for…

Human-Computer Interaction · Computer Science 2025-08-14 Jan Simson

Deep Active Learning via Open Set Recognition

In many applications, data is easy to acquire but expensive and time-consuming to label prominent examples include medical imaging and NLP. This disparity has only grown in recent years as our ability to collect data improves. Under these…

Machine Learning · Computer Science 2021-04-07 Jaya Krishna Mandivarapu , Blake Camp , Rolando Estrada

When Imbalance Comes Twice: Active Learning under Simulated Class Imbalance and Label Shift in Binary Semantic Segmentation

The aim of Active Learning is to select the most informative samples from an unlabelled set of data. This is useful in cases where the amount of data is large and labelling is expensive, such as in machine vision or medical imaging. Two…

Computer Vision and Pattern Recognition · Computer Science 2026-01-13 Julien Combes , Alexandre Derville , Jean-François Coeurjolly

ViewAL: Active Learning with Viewpoint Entropy for Semantic Segmentation

We propose ViewAL, a novel active learning strategy for semantic segmentation that exploits viewpoint consistency in multi-view datasets. Our core idea is that inconsistencies in model predictions across viewpoints provide a very reliable…

Computer Vision and Pattern Recognition · Computer Science 2020-03-20 Yawar Siddiqui , Julien Valentin , Matthias Nießner

Guided Data Discovery in Interactive Visualizations via Active Search

Recent advances in visual analytics have enabled us to learn from user interactions and uncover analytic goals. These innovations set the foundation for actively guiding users during data exploration. Providing such guidance will become…

Human-Computer Interaction · Computer Science 2022-07-19 Shayan Monadjemi , Sunwoo Ha , Quan Nguyen , Henry Chai , Roman Garnett , Alvitta Ottley

Visualization-Aware Sampling for Very Large Databases

Interactive visualizations are crucial in ad hoc data exploration and analysis. However, with the growing number of massive datasets, generating visualizations in interactive timescales is increasingly challenging. One approach for…

Databases · Computer Science 2017-01-25 Yongjoo Park , Michael Cafarella , Barzan Mozafari

Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision

Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry. Their data curation poses the challenges of expensive human labeling, inadequate computing resources and larger experiment turn around…

Computer Vision and Pattern Recognition · Computer Science 2019-01-07 Vishal Kaushal , Rishabh Iyer , Suraj Kothawade , Rohan Mahadev , Khoshrav Doctor , Ganesh Ramakrishnan