Related papers: Patterns Count-Based Labels for Datasets

On Evaluation of Unsupervised Feature Selection for Pattern Classification

Unsupervised feature selection aims to identify a compact subset of features that captures the intrinsic structure of data without supervised label. Most existing studies evaluate the performance of methods using the single-label dataset…

Machine Learning · Computer Science 2026-02-10 Gyu-Il Kim , Dae-Won Kim , Jaesung Lee

A Data Management Approach for Dataset Selection Using Human Computation

As the number of applications that use machine learning algorithms increases, the need for labeled data useful for training such algorithms intensifies. Getting labels typically involves employing humans to do the annotation, which directly…

Machine Learning · Computer Science 2013-07-16 Alexandros Ntoulas , Omar Alonso , Vasilis Kandylas

Improving Label Ranking Ensembles using Boosting Techniques

Label ranking is a prediction task which deals with learning a mapping between an instance and a ranking (i.e., order) of labels from a finite set, representing their relevance to the instance. Boosting is a well-known and reliable ensemble…

Machine Learning · Computer Science 2020-09-24 Lihi Dery , Erez Shmueli

Label-Embedding for Image Classification

Attributes act as intermediate representations that enable parameter sharing between classes, a must when training data is scarce. We propose to view attribute-based image classification as a label-embedding problem: each class is embedded…

Computer Vision and Pattern Recognition · Computer Science 2016-10-05 Zeynep Akata , Florent Perronnin , Zaid Harchaoui , Cordelia Schmid

Itemsets for Real-valued Datasets

Pattern mining is one of the most well-studied subfields in exploratory data analysis. While there is a significant amount of literature on how to discover and rank itemsets efficiently from binary data, there is surprisingly little…

Data Structures and Algorithms · Computer Science 2019-02-05 Nikolaj Tatti

Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets

Data is the engine of modern computer vision, which necessitates collecting large-scale datasets. This is expensive, and guaranteeing the quality of the labels is a major challenge. In this paper, we investigate efficient annotation…

Computer Vision and Pattern Recognition · Computer Science 2021-04-27 Yuan-Hong Liao , Amlan Kar , Sanja Fidler

Does the dataset meet your expectations? Explaining sample representation in image data

Since the behavior of a neural network model is adversely affected by a lack of diversity in training data, we present a method that identifies and explains such deficiencies. When a dataset is labeled, we note that annotations alone are…

Computer Vision and Pattern Recognition · Computer Science 2020-12-17 Dhasarathy Parthasarathy , Anton Johansson

Pedestrian Attribute Recognition as Label-balanced Multi-label Learning

Rooting in the scarcity of most attributes, realistic pedestrian attribute datasets exhibit unduly skewed data distribution, from which two types of model failures are delivered: (1) label imbalance: model predictions lean greatly towards…

Computer Vision and Pattern Recognition · Computer Science 2024-05-09 Yibo Zhou , Hai-Miao Hu , Yirong Xiang , Xiaokang Zhang , Haotian Wu

How many labelers do you have? A closer look at gold-standard labels

The construction of most supervised learning datasets revolves around collecting multiple labels for each instance, then aggregating the labels to form a type of "gold-standard". We question the wisdom of this pipeline by developing a…

Statistics Theory · Mathematics 2024-06-06 Chen Cheng , Hilal Asi , John Duchi

Which is the best model for my data?

In this paper, we tackle the problem of selecting the optimal model for a given structured pattern classification dataset. In this context, a model can be understood as a classifier and a hyperparameter configuration. The proposed…

Machine Learning · Computer Science 2022-10-27 Gonzalo Nápoles , Isel Grau , Çiçek Güven , Orçun Özdemir , Yamisleydi Salgueiro

The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards

Artificial intelligence (AI) systems built on incomplete or biased data will often exhibit problematic outcomes. Current methods of data analysis, particularly before model development, are costly and not standardized. The Dataset Nutrition…

Databases · Computer Science 2018-05-11 Sarah Holland , Ahmed Hosny , Sarah Newman , Joshua Joseph , Kasia Chmielinski

Auditing for Diversity using Representative Examples

Assessing the diversity of a dataset of information associated with people is crucial before using such data for downstream applications. For a given dataset, this often involves computing the imbalance or disparity in the empirical…

Computers and Society · Computer Science 2021-07-16 Vijay Keswani , L. Elisa Celis

Are Labels Always Necessary for Classifier Accuracy Evaluation?

To calculate the model accuracy on a computer vision task, e.g., object recognition, we usually require a test set composing of test samples and their ground truth labels. Whilst standard usage cases satisfy this requirement, many…

Computer Vision and Pattern Recognition · Computer Science 2021-05-26 Weijian Deng , Liang Zheng

Leveraging Schema Labels to Enhance Dataset Search

A search engine's ability to retrieve desirable datasets is important for data sharing and reuse. Existing dataset search engines typically rely on matching queries to dataset descriptions. However, a user may not have enough prior…

Information Retrieval · Computer Science 2020-01-29 Zhiyu Chen , Haiyan Jia , Jeff Heflin , Brian D. Davison

A New Scale for Attribute Dependency in Large Database Systems

Large, data centric applications are characterized by its different attributes. In modern day, a huge majority of the large data centric applications are based on relational model. The databases are collection of tables and every table…

Information Retrieval · Computer Science 2012-06-28 Soumya Sen , Anjan Dutta , Agostino Cortesi , Nabendu Chaki

Labelling as an unsupervised learning problem

Unravelling hidden patterns in datasets is a classical problem with many potential applications. In this paper, we present a challenge whose objective is to discover nonlinear relationships in noisy cloud of points. If a set of point…

Machine Learning · Statistics 2018-05-31 Terry Lyons , Imanol Perez Arribas

Data Efficient Training with Imbalanced Label Sample Distribution for Fashion Detection

Multi-label classification models have a wide range of applications in E-commerce, including visual-based label predictions and language-based sentiment classifications. A major challenge in achieving satisfactory performance for these…

Computer Vision and Pattern Recognition · Computer Science 2023-06-07 Xin Shen , Praful Agrawal , Zhongwei Cheng

Estimating Multi-label Accuracy using Labelset Distributions

A multi-label classifier estimates the binary label state (relevant vs irrelevant) for each of a set of concept labels, for any given instance. Probabilistic multi-label classifiers provide a predictive posterior distribution over all…

Machine Learning · Computer Science 2022-09-12 Laurence A. F. Park , Jesse Read

That Label's Got Style: Handling Label Style Bias for Uncertain Image Segmentation

Segmentation uncertainty models predict a distribution over plausible segmentations for a given input, which they learn from the annotator variation in the training set. However, in practice these annotations can differ systematically in…

Computer Vision and Pattern Recognition · Computer Science 2023-03-29 Kilian Zepf , Eike Petersen , Jes Frellsen , Aasa Feragen

Recognizing Variables from their Data via Deep Embeddings of Distributions

A key obstacle in automated analytics and meta-learning is the inability to recognize when different datasets contain measurements of the same variable. Because provided attribute labels are often uninformative in practice, this task may be…

Machine Learning · Computer Science 2019-09-12 Jonas Mueller , Alex Smola