English
Related papers

Related papers: Structurally Diverse Sampling for Sample-Efficient…

200 papers

In-context learning has shown great success in i.i.d semantic parsing splits, where the training and test sets are drawn from the same distribution. In this setup, models are typically prompted with demonstrations that are similar to the…

Computation and Language · Computer Science 2023-06-27 Itay Levy , Ben Bogin , Jonathan Berant

Modern semantic parsers suffer from two principal limitations. First, training requires expensive collection of utterance-program pairs. Second, semantic parsers fail to generalize at test time to new compositions/structures that have not…

Computation and Language · Computer Science 2021-09-07 Inbar Oren , Jonathan Herzig , Jonathan Berant

NLP models have progressed drastically in recent years, according to numerous datasets proposed to evaluate performance. Questions remain, however, about how particular dataset design choices may impact the conclusions we draw about model…

Computation and Language · Computer Science 2023-10-27 Kaiser Sun , Adina Williams , Dieuwke Hupkes

Beyond individual languages, multilingual natural language processing (NLP) research increasingly aims to develop models that perform well across languages generally. However, evaluating these systems on all the world's languages is…

Computation and Language · Computer Science 2025-09-09 Esther Ploeger , Wessel Poelman , Andreas Holck Høeg-Petersen , Anders Schlichtkrull , Miryam de Lhoneux , Johannes Bjerva

Neural network models often generalize poorly to mismatched domains or distributions. In NLP, this issue arises in particular when models are expected to generalize compositionally, that is, to novel combinations of familiar words and…

Computation and Language · Computer Science 2021-11-10 Wang Zhu , Peter Shaw , Tal Linzen , Fei Sha

In large-scale distributed scenarios, increasingly complex tasks demand more intelligent collaboration across networks, requiring the joint extraction of structural representations from data samples. However, conventional task-specific…

Machine Learning · Computer Science 2026-04-21 Zhuojun Tian , Chaouki Ben Issaid , Mehdi Bennis

Training reliable respiratory sound classification models remains challenging due to the limited size and subject diversity of datasets. Ensemble methods can improve robustness, but when base models are trained on identical data, models…

Machine Learning · Computer Science 2026-04-28 June-Woo Kim , Miika Toikkanen , Heejoon Koo , Yoon Tae Kim , Doyoung Kwon , Kyunghoon Kim

Sample selection improves the efficiency and effectiveness of machine learning models by providing informative and representative samples. Typically, samples can be modeled as a sample graph, where nodes are samples and edges represent…

Machine Learning · Computer Science 2025-03-04 Tianchi Xie , Jiangning Zhu , Guozu Ma , Minzhi Lin , Wei Chen , Weikai Yang , Shixia Liu

Sampling, the technique of reusing pieces of existing audio tracks to create new music content, is a very common practice in modern music production. In this paper, we tackle the challenging task of automatic sample identification, that is,…

Sound · Computer Science 2025-10-28 Alain Riou , Joan Serrà , Yuki Mitsufuji

Learning the causal structure behind data is invaluable for improving generalization and obtaining high-quality explanations. We propose a novel framework, Invariant Structure Learning (ISL), that is designed to improve causal structure…

Machine Learning · Computer Science 2022-06-15 Yunhao Ge , Sercan Ö. Arik , Jinsung Yoon , Ao Xu , Laurent Itti , Tomas Pfister

Recent work to enhance data partitioning strategies for more realistic model evaluation face challenges in providing a clear optimal choice. This study addresses these challenges, focusing on morphological segmentation and synthesizing…

Computation and Language · Computer Science 2024-04-16 Zoey Liu , Bonnie J. Dorr

Recent results in image classification and extractive question answering have observed that pre-trained models trained on less in-distribution data have better out-of-distribution performance. However, it is unclear how broadly these trends…

Computation and Language · Computer Science 2023-06-01 Nelson F. Liu , Ananya Kumar , Percy Liang , Robin Jia

The i.i.d. assumption is a useful idealization that underpins many successful approaches to supervised machine learning. However, its violation can lead to models that learn to exploit spurious correlations in the training data, rendering…

Machine Learning · Computer Science 2020-06-15 Daniel Pace , Alessandra Russo , Murray Shanahan

Finetuning large language models on instruction data is crucial for enhancing pre-trained knowledge and improving instruction-following capabilities. As instruction datasets proliferate, selecting optimal data for effective training becomes…

Computation and Language · Computer Science 2024-09-18 Simon Yu , Liangyu Chen , Sara Ahmadian , Marzieh Fadaee

Over recent years, an increasing amount of compute and data has been poured into training large language models (LLMs), usually by doing one-pass learning on as many tokens as possible randomly selected from large-scale web corpora. While…

Computation and Language · Computer Science 2023-08-24 Kushal Tirumala , Daniel Simig , Armen Aghajanyan , Ari S. Morcos

In this paper, the problem of training a classifier on a dataset with incomplete features is addressed. We assume that different subsets of features (random or structured) are available at each data instance. This situation typically occurs…

Machine Learning · Computer Science 2021-04-20 Cesar F. Caiafa , Ziyao Wang , Jordi Solé-Casals , Qibin Zhao

Standard NLP benchmarks often fail to capture vulnerabilities stemming from dataset artifacts and spurious correlations. Contrast sets address this gap by challenging models near decision boundaries but are traditionally labor-intensive to…

Computation and Language · Computer Science 2025-03-11 Hender Lin

We propose an instance-wise adaptive sampling framework for constructing compact and informative training datasets for supervised learning of inverse problem solutions. Typical learning-based approaches aim to learn a general-purpose…

Machine Learning · Computer Science 2026-02-20 Jiequn Han , Kui Ren , Nathan Soedjak

In text-to-SQL tasks -- as in much of NLP -- compositional generalization is a major challenge: neural networks struggle with compositional generalization where training and test distributions differ. However, most recent attempts to…

Computation and Language · Computer Science 2022-05-05 Yujian Gan , Xinyun Chen , Qiuping Huang , Matthew Purver

We investigate the success conditions for compositional generalization of CLIP models on real-world data through performance prediction. Prior work shows that CLIP requires exponentially more pretraining data for linear performance gains on…

Machine Learning · Computer Science 2025-02-26 Thaddäus Wiedemer , Yash Sharma , Ameya Prabhu , Matthias Bethge , Wieland Brendel
‹ Prev 1 2 3 10 Next ›