Related papers: Reducing statistical time-series problems to binar…
Many binary classification problems minimize misclassification above (or below) a threshold. We show that instances of ranking problems, accuracy at the top or hypothesis testing may be written in this form. We propose a general framework…
Binary, or one-bit, representations of data arise naturally in many applications, and are appealing in both hardware implementations and algorithm design. In this work, we study the problem of data classification from binary data and…
We study the problem of clustering sequences of unlabeled point sets taken from a common metric space. Such scenarios arise naturally in applications where a system or process is observed in distinct time intervals, such as biological…
Genetic data are frequently categorical and have complex dependence structures that are not always well understood. For this reason, clustering and classification based on genetic data, while highly relevant, are challenging statistical…
Stationarity is a very general, qualitative assumption, that can be assessed on the basis of application specifics. It is thus a rather attractive assumption to base statistical analysis on, especially for problems for which less general…
When the competing classes in a classification problem are not of comparable size, many popular classifiers exhibit a bias towards larger classes, and the nearest neighbor classifier is no exception. To take care of this problem, we develop…
In today's data driven world, storing, processing, and gleaning insights from large-scale data are major challenges. Data compression is often required in order to store large amounts of high-dimensional data, and thus, efficient inference…
Binary classification is one of the most common problem in machine learning. It consists in predicting whether a given element belongs to a particular class. In this paper, a new algorithm for binary classification is proposed using a…
Classification of datasets into two or more distinct classes is an important machine learning task. Many methods are able to classify binary classification tasks with a very high accuracy on test data, but cannot provide any easily…
In various situations one is given only the predictions of multiple classifiers over a large unlabeled test data. This scenario raises the following questions: Without any labeled data and without any a-priori knowledge about the…
We study the problem of clustering a set of items from binary user feedback. Such a problem arises in crowdsourcing platforms solving large-scale labeling tasks with minimal effort put on the users. For example, in some of the recent…
Classification is a fundamental problem in machine learning and data mining. During the past decades, numerous classification methods have been presented based on different principles. However, most existing classifiers cast the…
Recent advancements in quantum computing and quantum-inspired algorithms have sparked renewed interest in binary optimization. These hardware and software innovations promise to revolutionize solution times for complex problems. In this…
A set of probabilistic predictions is well calibrated if the events that are predicted to occur with probability p do in fact occur about p fraction of the time. Well calibrated predictions are particularly important when machine learning…
We study three notions of uncertainty quantification -- calibration, confidence intervals and prediction sets -- for binary classification in the distribution-free setting, that is without making any distributional assumptions on the data.…
Multivariate time series is a very active topic in the research community and many machine learning tasks are being used in order to extract information from this type of data. However, in real-world problems data has missing values, which…
We consider the unsupervised learning problem of assigning labels to unlabeled data. A naive approach is to use clustering methods, but this works well only when data is properly clustered and each cluster corresponds to an underlying…
In classification problems, especially those that categorize data into a large number of classes, the classes often naturally follow a hierarchical structure. That is, some classes are likely to share similar structures and features. Those…
Bi-clustering is a useful approach in analyzing biological data when observations come from heterogeneous groups and have a large number of features. We outline a general Bayesian approach in tackling bi-clustering problems in moderate to…
The independence clustering problem is considered in the following formulation: given a set $S$ of random variables, it is required to find the finest partitioning $\{U_1,\dots,U_k\}$ of $S$ into clusters such that the clusters…