Related papers: Reducing statistical time-series problems to binar…

General Framework for Binary Classification on Top Samples

Many binary classification problems minimize misclassification above (or below) a threshold. We show that instances of ranking problems, accuracy at the top or hypothesis testing may be written in this form. We propose a general framework…

Machine Learning · Computer Science 2020-02-26 Lukáš Adam , Václav Mácha , Václav Šmídl , Tomáš Pevný

Simple Classification using Binary Data

Binary, or one-bit, representations of data arise naturally in many applications, and are appealing in both hardware implementations and algorithm design. In this work, we study the problem of data classification from binary data and…

Machine Learning · Computer Science 2017-07-10 Deanna Needell , Rayan Saab , Tina Woolf

Temporal Clustering

We study the problem of clustering sequences of unlabeled point sets taken from a common metric space. Such scenarios arise naturally in applications where a system or process is observed in distinct time intervals, such as biological…

Data Structures and Algorithms · Computer Science 2017-10-17 Tamal K. Dey , Alfred Rossi , Anastasios Sidiropoulos

Clustering and Classification of Genetic Data Through U-Statistics

Genetic data are frequently categorical and have complex dependence structures that are not always well understood. For this reason, clustering and classification based on genetic data, while highly relevant, are challenging statistical…

Methodology · Statistics 2016-06-13 Gabriela Bettella Cybis , Marcio Valk , Silvia Regina Costa Lopes

Asymptotic nonparametric statistical analysis of stationary time series

Stationarity is a very general, qualitative assumption, that can be assessed on the basis of application specifics. It is thus a rather attractive assumption to base statistical analysis on, especially for problems for which less general…

Statistics Theory · Mathematics 2019-04-02 Daniil Ryabko

Nearest Neighbor Classification based on Imbalanced Data: A Statistical Approach

When the competing classes in a classification problem are not of comparable size, many popular classifiers exhibit a bias towards larger classes, and the nearest neighbor classifier is no exception. To take care of this problem, we develop…

Methodology · Statistics 2023-11-02 Anvit Garg , Anil K. Ghosh , Soham Sarkar

An iterative method for classification of binary data

In today's data driven world, storing, processing, and gleaning insights from large-scale data are major challenges. Data compression is often required in order to store large amounts of high-dimensional data, and thus, efficient inference…

Machine Learning · Statistics 2018-09-11 Denali Molitor , Deanna Needell

Binary Classification in Unstructured Space With Hypergraph Case-Based Reasoning

Binary classification is one of the most common problem in machine learning. It consists in predicting whether a given element belongs to a particular class. In this paper, a new algorithm for binary classification is proposed using a…

Machine Learning · Computer Science 2019-03-12 Alexandre Quemy

Evaluating Nonlinear Decision Trees for Binary Classification Tasks with Other Existing Methods

Classification of datasets into two or more distinct classes is an important machine learning task. Many methods are able to classify binary classification tasks with a very high accuracy on test data, but cannot provide any easily…

Machine Learning · Computer Science 2020-08-26 Yashesh Dhebar , Sparsh Gupta , Kalyanmoy Deb

Estimating the Accuracies of Multiple Classifiers Without Labeled Data

In various situations one is given only the predictions of multiple classifiers over a large unlabeled test data. This scenario raises the following questions: Without any labeled data and without any a-priori knowledge about the…

Machine Learning · Statistics 2014-10-31 Ariel Jaffe , Boaz Nadler , Yuval Kluger

Optimal Clustering from Noisy Binary Feedback

We study the problem of clustering a set of items from binary user feedback. Such a problem arises in crowdsourcing platforms solving large-scale labeling tasks with minimal effort put on the users. For example, in some of the recent…

Machine Learning · Statistics 2024-12-20 Kaito Ariu , Jungseul Ok , Alexandre Proutiere , Se-Young Yun

Instance-Based Classification through Hypothesis Testing

Classification is a fundamental problem in machine learning and data mining. During the past decades, numerous classification methods have been presented based on different principles. However, most existing classifiers cast the…

Machine Learning · Computer Science 2019-04-23 Zengyou He , Chaohua Sheng , Yan Liu , Quan Zou

Improving the convergence of an iterative algorithm for solving arbitrary linear equation systems using classical or quantum binary optimization

Recent advancements in quantum computing and quantum-inspired algorithms have sparked renewed interest in binary optimization. These hardware and software innovations promise to revolutionize solution times for complex problems. In this…

Quantum Physics · Physics 2024-09-30 Erick R. Castro , Eldues O. Martins , Roberto S. Sarthour , Alexandre M. Souza , Ivan S. Oliveira

Binary Classifier Calibration: Bayesian Non-Parametric Approach

A set of probabilistic predictions is well calibrated if the events that are predicted to occur with probability p do in fact occur about p fraction of the time. Well calibrated predictions are particularly important when machine learning…

Machine Learning · Statistics 2014-01-14 Mahdi Pakdaman Naeini , Gregory F. Cooper , Milos Hauskrecht

Distribution-free binary classification: prediction sets, confidence intervals and calibration

We study three notions of uncertainty quantification -- calibration, confidence intervals and prediction sets -- for binary classification in the distribution-free setting, that is without making any distributional assumptions on the data.…

Machine Learning · Statistics 2022-02-17 Chirag Gupta , Aleksandr Podkopaev , Aaditya Ramdas

Time Series Imputation

Multivariate time series is a very active topic in the research community and many machine learning tasks are being used in order to extract information from this type of data. However, in real-world problems data has missing values, which…

Machine Learning · Computer Science 2019-03-26 Samuel Arcadinho , Paulo Mateus

Clustering Unclustered Data: Unsupervised Binary Labeling of Two Datasets Having Different Class Balances

We consider the unsupervised learning problem of assigning labels to unlabeled data. A naive approach is to use clustering methods, but this works well only when data is properly clustered and each cluster corresponds to an underlying…

Machine Learning · Computer Science 2013-05-02 Marthinus Christoffel du Plessis , Masashi Sugiyama

Hierarchical Classification using Binary Data

In classification problems, especially those that categorize data into a large number of classes, the classes often naturally follow a hierarchical structure. That is, some classes are likely to share similar structures and features. Those…

Machine Learning · Computer Science 2018-07-25 Denali Molitor , Deanna Needell

Bayesian Bi-clustering Methods with Applications in Computational Biology

Bi-clustering is a useful approach in analyzing biological data when observations come from heterogeneous groups and have a large number of features. We outline a general Bayesian approach in tackling bi-clustering problems in moderate to…

Applications · Statistics 2021-02-11 Han Yan , Jiexing Wu , Yang Li , Jun S. Liu

Independence clustering (without a matrix)

The independence clustering problem is considered in the following formulation: given a set $S$ of random variables, it is required to find the finest partitioning $\{U_1,\dots,U_k\}$ of $S$ into clusters such that the clusters…

Machine Learning · Computer Science 2017-03-21 Daniil Ryabko