Related papers: Learning conditional distributions on continuous s…
We propose a novel end-to-end neural network architecture that, once trained, directly outputs a probabilistic clustering of a batch of input examples in one pass. It estimates a distribution over the number of clusters $k$, and for each $1…
Consider a setting with multiple units (e.g., individuals, cohorts, geographic locations) and outcomes (e.g., treatments, times, items), where the goal is to learn a multivariate distribution for each unit-outcome entry, such as the…
We study supervised learning problems using clustering constraints to impose structure on either features or samples, seeking to help both prediction and interpretation. The problem of clustering features arises naturally in text…
Learning a robust classifier from a few samples remains a key challenge in machine learning. A major thrust of research has been focused on developing $k$-nearest neighbor ($k$-NN) based algorithms combined with metric learning that…
Many machine learning tasks, such as learning with invariance and policy evaluation in reinforcement learning, can be characterized as problems of learning from conditional distributions. In such problems, each sample $x$ itself is…
Much more attention has been paid to unsupervised feature selection nowadays due to the emergence of massive unlabeled data. The distribution of samples and the latent effect of training a learning method using samples in more effective…
Analyzing high-dimensional data with manifold learning algorithms often requires searching for the nearest neighbors of all observations. This presents a computational bottleneck in statistical manifold learning when observations of…
Neural networks are not learning optimal decision boundaries. We show that decision boundaries are situated in areas of low training data density. They are impacted by few training samples which can easily lead to overfitting. We provide a…
Decision-making problems often feature uncertainty stemming from heterogeneous and context-dependent human preferences. To address this, we propose a sequential learning-and-optimization pipeline to learn preference distributions and…
Unsupervised clustering algorithm can effectively reduce the dimension of high-dimensional unlabeled data, thus reducing the time and space complexity of data processing. However, the traditional clustering algorithm needs to set the upper…
Feature selection methods have an important role on the readability of data and the reduction of complexity of learning algorithms. In recent years, a variety of efforts are investigated on feature selection problems based on unsupervised…
Sequential hypothesis testing is a desirable decision making strategy in any time sensitive scenario. Compared with fixed sample-size testing, sequential testing is capable of achieving identical probability of error requirements using less…
Explicitly or implicitly, most of dimensionality reduction methods need to determine which samples are neighbors and the similarity between the neighbors in the original highdimensional space. The projection matrix is then learned on the…
In transfer learning, we wish to make inference about a target population when we have access to data both from the distribution itself, and from a different but related source distribution. We introduce a flexible framework for transfer…
Learning natural and diverse behaviors from human motion datasets remains challenging in physics-based character control. Existing conditional adversarial models often suffer from tight and biased embedding distributions where embeddings…
In this work, we consider multitask learning problems where clusters of nodes are interested in estimating their own parameter vector. Cooperation among clusters is beneficial when the optimal models of adjacent clusters have a good number…
It is becoming increasingly important to understand the vulnerability of machine learning models to adversarial attacks. One of the fundamental problems in adversarial machine learning is to quantify how much training data is needed in the…
In this paper we introduce a kernel-based measure for detecting differences between two conditional distributions. Using the `kernel trick' and nearest-neighbor graphs, we propose a consistent estimate of this measure which can be computed…
Many modern methods for prediction leverage nearest neighbor search to find past training examples most similar to a test example, an idea that dates back in text to at least the 11th century and has stood the test of time. This monograph…
Learning conditional distributions is challenging because the desired outcome is not a single distribution but multiple distributions that correspond to multiple instances of the covariates. We introduce a novel neural entropic optimal…