Related papers: A statistical Testing Procedure for Validating Cla…

Instance-Based Classification through Hypothesis Testing

Classification is a fundamental problem in machine learning and data mining. During the past decades, numerous classification methods have been presented based on different principles. However, most existing classifiers cast the…

Machine Learning · Computer Science 2019-04-23 Zengyou He , Chaohua Sheng , Yan Liu , Quan Zou

A label-efficient two-sample test

Two-sample tests evaluate whether two samples are realizations of the same distribution (the null hypothesis) or two different distributions (the alternative hypothesis). We consider a new setting for this problem where sample features are…

Machine Learning · Computer Science 2022-07-20 Weizhi Li , Gautam Dasarathy , Karthikeyan Natesan Ramamurthy , Visar Berisha

Latent protein trees

Unbiased, label-free proteomics is becoming a powerful technique for measuring protein expression in almost any biological sample. The output of these measurements after preprocessing is a collection of features and their associated…

Applications · Statistics 2013-12-06 Ricardo Henao , J. Will Thompson , M. Arthur Moseley , Geoffrey S. Ginsburg , Lawrence Carin , Joseph E. Lucas

Hypothesis Testing for Class-Conditional Noise Using Local Maximum Likelihood

In supervised learning, automatically assessing the quality of the labels before any learning takes place remains an open research question. In certain particular cases, hypothesis testing procedures have been proposed to assess whether a…

Machine Learning · Computer Science 2023-12-19 Weisong Yang , Rafael Poyiadzi , Niall Twomey , Raul Santos Rodriguez

Centrality and Consistency: Two-Stage Clean Samples Identification for Learning with Instance-Dependent Noisy Labels

Deep models trained with noisy labels are prone to over-fitting and struggle in generalization. Most existing solutions are based on an ideal assumption that the label noise is class-conditional, i.e., instances of the same class share the…

Computer Vision and Pattern Recognition · Computer Science 2022-08-01 Ganlong Zhao , Guanbin Li , Yipeng Qin , Feng Liu , Yizhou Yu

Learning from Stochastic Labels

Annotating multi-class instances is a crucial task in the field of machine learning. Unfortunately, identifying the correct class label from a long sequence of candidate labels is time-consuming and laborious. To alleviate this problem, we…

Machine Learning · Computer Science 2025-12-05 Meng Wei , Zhongnian Li , Yong Zhou , Qiaoyu Guo , Xinzheng Xu

Validated Intraclass Correlation Statistics to Test Item Performance Models

A new method, with an application program in Matlab code, is proposed for testing item performance models on empirical databases. This method uses data intraclass correlation statistics as expected correlations to which one compares simple…

Methodology · Statistics 2011-04-13 Pierre Courrieu , Muriele Brand-D'Abrescia , Ronald Peereman , Daniel Spieler , Arnaud Rey

Identity testing under label mismatch

Testing whether the observed data conforms to a purported model (probability distribution) is a basic and fundamental statistical task, and one that is by now well understood. However, the standard formulation, identity testing, fails to…

Statistics Theory · Mathematics 2021-05-06 Clément L. Canonne , Karl Wimmer

A Combinatorial Perspective of the Protein Inference Problem

In a shotgun proteomics experiment, proteins are the most biologically meaningful output. The success of proteomics studies depends on the ability to accurately and efficiently identify proteins. Many methods have been proposed to…

Quantitative Methods · Quantitative Biology 2012-11-30 Chao Yang , Zengyou He , Weichuan Yu

Enhancing Instance-Level Image Classification with Set-Level Labels

Instance-level image classification tasks have traditionally relied on single-instance labels to train models, e.g., few-shot learning and transfer learning. However, set-level coarse-grained labels that capture relationships among…

Machine Learning · Computer Science 2023-11-21 Renyu Zhang , Aly A. Khan , Yuxin Chen , Robert L. Grossman

Learning with Proper Partial Labels

Partial-label learning is a kind of weakly-supervised learning with inexact labels, where for each training example, we are given a set of candidate labels instead of only one true label. Recently, various approaches on partial-label…

Machine Learning · Computer Science 2022-08-30 Zhenguo Wu , Jiaqi Lv , Masashi Sugiyama

Learning from Label Proportions with Instance-wise Consistency

Learning from Label Proportions (LLP) is a weakly supervised learning method that aims to perform instance classification from training data consisting of pairs of bags containing multiple instances and the class label proportions within…

Machine Learning · Computer Science 2023-02-22 Ryoma Kobayashi , Yusuke Mukuta , Tatsuya Harada

Significance Analysis of High-Dimensional, Low-Sample Size Partially Labeled Data

Classification and clustering are both important topics in statistical learning. A natural question herein is whether predefined classes are really different from one another, or whether clusters are really there. Specifically, we may be…

Machine Learning · Statistics 2015-09-22 Qiyi Lu , Xingye Qiao

Discovery of Proteomics based on Machine learning

The ultimate target of proteomics identification is to identify and quantify the protein in the organism. Mass spectrometry (MS) based on label-free protein quantitation has mainly focused on analysis of peptide spectral counts and ion peak…

Quantitative Methods · Quantitative Biology 2013-12-05 Biao He , Baochang Zhang , Yan Fu

On an Exact and Nonparametric Test for the Separability of Two Classes by Means of a Simple Threshold

This paper introduces a statistical test inferring whether a variable allows separating two classes by means of a single critical value. Its test statistic is the prediction error of a nonparametric threshold classifier. While this approach…

Methodology · Statistics 2017-07-17 Fabian Schroeder

An end-to-end approach for the verification problem: learning the right distance

In this contribution, we augment the metric learning setting by introducing a parametric pseudo-distance, trained jointly with the encoder. Several interpretations are thus drawn for the learned distance-like model's output. We first show…

Machine Learning · Computer Science 2020-08-17 Joao Monteiro , Isabela Albuquerque , Jahangir Alam , R Devon Hjelm , Tiago Falk

Improving the performance of object detection by preserving label distribution

Object detection is a task that performs position identification and label classification of objects in images or videos. The information obtained through this process plays an essential role in various tasks in the field of computer…

Computer Vision and Pattern Recognition · Computer Science 2023-09-06 Heewon Lee , Sangtae Ahn

Instance Dependent Testing of Samplers using Interval Conditioning

Sampling algorithms play a pivotal role in probabilistic AI. However, verifying if a sampler program indeed samples from the claimed distribution is a notoriously hard problem. Provably correct testers like Barbarik, Teq, Flash, CubeProbe…

Data Structures and Algorithms · Computer Science 2025-12-09 Rishiraj Bhattacharyya , Sourav Chakraborty , Yash Pote , Uddalok Sarkar , Sayantan Sen

One Size Does Not Fit All: Exploring Variable Thresholds for Distance-Based Multi-Label Text Classification

Distance-based unsupervised text classification is a method within text classification that leverages the semantic similarity between a label and a text to determine label relevance. This method provides numerous benefits, including fast…

Computation and Language · Computer Science 2025-10-14 Jens Van Nooten , Andriy Kosar , Guy De Pauw , Walter Daelemans

Mass spectrometry based protein identification with accurate statistical significance assignment

Motivation: Assigning statistical significance accurately has become increasingly important as meta data of many types, often assembled in hierarchies, are constructed and combined for further biological analyses. Statistical inaccuracy of…

Quantitative Methods · Quantitative Biology 2014-07-25 Gelio Alves , Yi-Kuo Yu