Related papers: Learning Tree-Structured Composition of Data Augme…

Learning Augmented Binary Search Trees

A treap is a classic randomized binary search tree data structure that is easy to implement and supports O(\log n) expected time access. However, classic treaps do not take advantage of the input distribution or patterns in the input. Given…

Data Structures and Algorithms · Computer Science 2022-06-27 Honghao Lin , Tian Luo , David P. Woodruff

Lock-Free Augmented Trees

Augmenting an existing sequential data structure with extra information to support greater functionality is a widely used technique. For example, search trees are augmented to build sequential data structures like order-statistic trees,…

Data Structures and Algorithms · Computer Science 2024-05-20 Panagiota Fatourou , Eric Ruppert

Optimizing Data Augmentation Policy Through Random Unidimensional Search

It is no secret amongst deep learning researchers that finding the optimal data augmentation strategy during training can mean the difference between state-of-the-art performance and a run-of-the-mill result. To that end, the community has…

Machine Learning · Computer Science 2023-07-17 Xiaomeng Dong , Michael Potter , Gaurav Kumar , Yun-Chan Tsai , V. Ratna Saripalli , Theodore Trafalis

Learning-Augmented Search Data Structures

We study the integration of machine learning advice to improve upon traditional data structure designed for efficient search queries. Although there has been recent effort in improving the performance of binary search trees using machine…

Data Structures and Algorithms · Computer Science 2025-03-10 Chunkai Fu , Brandon G. Nguyen , Jung Hoon Seo , Ryan Zesch , Samson Zhou

A Kernel Theory of Modern Data Augmentation

Data augmentation, a technique in which a training set is expanded with class-preserving transformations, is ubiquitous in modern machine learning pipelines. In this paper, we seek to establish a theoretical framework for understanding data…

Machine Learning · Computer Science 2019-03-21 Tri Dao , Albert Gu , Alexander J. Ratner , Virginia Smith , Christopher De Sa , Christopher Ré

Improving Deep Learning using Generic Data Augmentation

Deep artificial neural networks require a large corpus of training data in order to effectively learn, where collection of such training data is often expensive and laborious. Data augmentation overcomes this issue by artificially inflating…

Machine Learning · Computer Science 2017-08-22 Luke Taylor , Geoff Nitschke

Learning to Compose Domain-Specific Transformations for Data Augmentation

Data augmentation is a ubiquitous technique for increasing the size of labeled training sets by leveraging task-specific data transformations that preserve class labels. While it is often easy for domain experts to specify individual…

Machine Learning · Statistics 2018-12-10 Alexander J. Ratner , Henry R. Ehrenberg , Zeshan Hussain , Jared Dunnmon , Christopher Ré

A fast algorithm for constructing balanced binary search trees

We suggest a new non-recursive algorithm for constructing a binary search tree given an array of numbers. The algorithm has $O(N)$ time and $O(1)$ memory complexity if the given array of $N$ numbers is sorted. The resulting tree is of…

Data Structures and Algorithms · Computer Science 2022-07-20 Pavel S. Ruzankin

When Dynamic Data Selection Meets Data Augmentation

Dynamic data selection aims to accelerate training with lossless performance. However, reducing training data inherently limits data diversity, potentially hindering generalization. While data augmentation is widely used to enhance…

Machine Learning · Computer Science 2025-05-13 Suorong Yang , Peng Ye , Furao Shen , Dongzhan Zhou

Automatic Data Augmentation for 3D Medical Image Segmentation

Data augmentation is an effective and universal technique for improving generalization performance of deep neural networks. It could enrich diversity of training samples that is essential in medical image segmentation tasks because 1) the…

Image and Video Processing · Electrical Eng. & Systems 2020-12-29 Ju Xu , Mengzhang Li , Zhanxing Zhu

Rethinking Data Augmentation for Tabular Data in Deep Learning

Tabular data is the most widely used data format in machine learning (ML). While tree-based methods outperform DL-based methods in supervised learning, recent literature reports that self-supervised learning with Transformer-based models…

Machine Learning · Computer Science 2023-05-23 Soma Onishi , Shoya Meguro

Enabling Data Diversity: Efficient Automatic Augmentation via Regularized Adversarial Training

Data augmentation has proved extremely useful by increasing training data variance to alleviate overfitting and improve deep neural networks' generalization performance. In medical image analysis, a well-designed augmentation policy usually…

Computer Vision and Pattern Recognition · Computer Science 2021-03-31 Yunhe Gao , Zhiqiang Tang , Mu Zhou , Dimitris Metaxas

Tied-Augment: Controlling Representation Similarity Improves Data Augmentation

Data augmentation methods have played an important role in the recent advance of deep learning models, and have become an indispensable component of state-of-the-art models in semi-supervised, self-supervised, and supervised training for…

Computer Vision and Pattern Recognition · Computer Science 2023-05-24 Emirhan Kurtulus , Zichao Li , Yann Dauphin , Ekin Dogus Cubuk

Improved Mixed-Example Data Augmentation

In order to reduce overfitting, neural networks are typically trained with data augmentation, the practice of artificially generating additional training data via label-preserving transformations of existing training examples. While these…

Computer Vision and Pattern Recognition · Computer Science 2019-01-23 Cecilia Summers , Michael J. Dinneen

Dynamic Trees with Almost-Optimal Access Cost

An optimal binary search tree for an access sequence on elements is a static tree that minimizes the total search cost. Constructing perfectly optimal binary search trees is expensive so the most efficient algorithms construct almost…

Data Structures and Algorithms · Computer Science 2018-06-28 Mordecai Golin , John Iacono , Stefan Langerman , J. Ian Munro , Yakov Nekrich

The Effectiveness of Data Augmentation in Image Classification using Deep Learning

In this paper, we explore and compare multiple solutions to the problem of data augmentation in image classification. Previous work has demonstrated the effectiveness of data augmentation through simple techniques, such as cropping,…

Computer Vision and Pattern Recognition · Computer Science 2017-12-14 Luis Perez , Jason Wang

TreeMix: Compositional Constituency-based Data Augmentation for Natural Language Understanding

Data augmentation is an effective approach to tackle over-fitting. Many previous works have proposed different data augmentations strategies for NLP, such as noise injection, word replacement, back-translation etc. Though effective, they…

Computation and Language · Computer Science 2022-07-13 Le Zhang , Zichao Yang , Diyi Yang

Substructure Substitution: Structured Data Augmentation for NLP

We study a family of data augmentation methods, substructure substitution (SUB2), for natural language processing (NLP) tasks. SUB2 generates new examples by substituting substructures (e.g., subtrees or subsequences) with ones with the…

Computation and Language · Computer Science 2021-01-05 Haoyue Shi , Karen Livescu , Kevin Gimpel

A Survey on Data Augmentation for Text Classification

Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing a model's generalization…

Computation and Language · Computer Science 2022-09-09 Markus Bayer , Marc-André Kaufhold , Christian Reuter

Data Augmentation via Structured Adversarial Perturbations

Data augmentation is a major component of many machine learning methods with state-of-the-art performance. Common augmentation strategies work by drawing random samples from a space of transformations. Unfortunately, such sampling…

Machine Learning · Computer Science 2020-11-06 Calvin Luo , Hossein Mobahi , Samy Bengio