Related papers: Set Based Stochastic Subsampling
Progressive Neural Network Learning is a class of algorithms that incrementally construct the network's topology and optimize its parameters based on the training data. While this approach exempts the users from the manual task of designing…
This paper addresses the problem of sequential submodular maximization: selecting and ranking items in a sequence to optimize some composite submodular function. In contrast to most of the previous works, which assume access to the utility…
Network datasets appear across a wide range of scientific fields, including biology, physics, and the social sciences. To enable data-driven discoveries from these networks, statistical inference techniques like estimation and hypothesis…
The sheer scale of modern datasets has resulted in a dire need for summarization techniques that identify representative elements in a dataset. Fortunately, the vast majority of data summarization tasks satisfy an intuitive diminishing…
We present a new "learning-to-learn"-type approach that enables rapid learning of concepts from small-to-medium sized training sets and is primarily designed for web-initialized image retrieval. At the core of our approach is a deep…
Data imbalance remains one of the open challenges in the contemporary machine learning. It is especially prevalent in case of medical data, such as histopathological images. Traditional data-level approaches for dealing with data imbalance…
Model selection requires repeatedly evaluating models on a given dataset and measuring their relative performances. In modern applications of machine learning, the models being considered are increasingly more expensive to evaluate and the…
Deep convolutional neural networks (CNNs) have shown excellent performance in object recognition tasks and dense classification problems such as semantic segmentation. However, training deep neural networks on large and sparse datasets is…
Data rebalancing techniques, including oversampling and undersampling, are a common approach to addressing the challenges of imbalanced data. To tackle unresolved problems related to both oversampling and undersampling, we propose a new…
Downsampling is widely adopted to achieve a good trade-off between accuracy and latency for visual recognition. Unfortunately, the commonly used pooling layers are not learned, and thus cannot preserve important information. As another…
Self-supervised learning (SSL), as a newly emerging unsupervised representation learning paradigm, generally follows a two-stage learning pipeline: 1) learning invariant and discriminative representations with auto-annotation pretext(s),…
The level sets of neural networks represent fundamental properties such as decision boundaries of classifiers and are used to model non-linear manifold data such as curves and surfaces. Thus, methods for controlling the neural level sets…
Deep metric learning maps visually similar images onto nearby locations and visually dissimilar images apart from each other in an embedding manifold. The learning process is mainly based on the supplied image negative and positive training…
Training models to high-end performance requires availability of large labeled datasets, which are expensive to get. The goal of our work is to automatically synthesize labeled datasets that are relevant for a downstream task. We propose…
In this work, we propose a multi-stage training strategy for the development of deep learning algorithms applied to problems with multiscale features. Each stage of the pro-posed strategy shares an (almost) identical network structure and…
Stochastic gradient descent samples uniformly the training set to build an unbiased gradient estimate with a limited number of samples. However, at a given step of the training process, some data are more helpful than others to continue…
Most stochastic gradient descent algorithms can optimize neural networks that are sub-differentiable in their parameters; however, this implies that the neural network's activation function must exhibit a degree of continuity which limits…
Subsampling of node sets is useful in contexts such as multilevel methods, computer graphics, and machine learning. On uniform grid-based node sets, the process of subsampling is simple. However, on node sets with high density variation,…
Big data is ubiquitous in practices, and it has also led to heavy computation burden. To reduce the calculation cost and ensure the effectiveness of parameter estimators, an optimal subset sampling method is proposed to estimate the…
We study the problem of selecting limited features to observe such that models trained on them can perform well simultaneously across multiple subpopulations. This problem has applications in settings where collecting each feature is…