Related papers: Predicting Performance for Natural Language Proces…

Towards More Fine-grained and Reliable NLP Performance Prediction

Performance prediction, the task of estimating a system's performance without performing experiments, allows us to reduce the experimental burden caused by the combinatorial explosion of different datasets, languages, tasks, and models. In…

Computation and Language · Computer Science 2021-02-11 Zihuiwen Ye , Pengfei Liu , Jinlan Fu , Graham Neubig

Predicting the Performance of Multilingual NLP Models

Recent advancements in NLP have given us models like mBERT and XLMR that can serve over 100 languages. The languages that these models are evaluated on, however, are very few in number, and it is unlikely that evaluation datasets will cover…

Computation and Language · Computer Science 2021-10-19 Anirudh Srinivasan , Sunayana Sitaram , Tanuja Ganu , Sandipan Dandapat , Kalika Bali , Monojit Choudhury

Drawing Causal Inferences About Performance Effects in NLP

This article emphasizes that NLP as a science seeks to make inferences about the performance effects that result from applying one method (compared to another method) in the processing of natural language. Yet NLP research in practice…

Computation and Language · Computer Science 2022-09-15 Sandra Wankmüller

An Empirical Study of Factors Affecting Language-Independent Models

Scaling existing applications and solutions to multiple human languages has traditionally proven to be difficult, mainly due to the language-dependent nature of preprocessing and feature engineering techniques employed in traditional…

Computation and Language · Computer Science 2020-01-01 Xiaotong Liu , Yingbei Tong , Anbang Xu , Rama Akkiraju

Predicting Empirical AI Research Outcomes with Language Models

Many promising-looking ideas in AI research fail to deliver, but their validation takes substantial human labor and compute. Predicting an idea's chance of success is thus crucial for accelerating empirical AI research, a skill that even…

Artificial Intelligence · Computer Science 2025-06-03 Jiaxin Wen , Chenglei Si , Yueh-han Chen , He He , Shi Feng

Testing the effectiveness of saliency-based explainability in NLP using randomized survey-based experiments

As the applications of Natural Language Processing (NLP) in sensitive areas like Political Profiling, Review of Essays in Education, etc. proliferate, there is a great need for increasing transparency in NLP models to build trust with…

Computation and Language · Computer Science 2022-11-29 Adel Rahimi , Shaurya Jain

Beyond Static Models and Test Sets: Benchmarking the Potential of Pre-trained Models Across Tasks and Languages

Although recent Massively Multilingual Language Models (MMLMs) like mBERT and XLMR support around 100 languages, most existing multilingual NLP benchmarks provide evaluation data in only a handful of these languages with little linguistic…

Computation and Language · Computer Science 2022-11-15 Kabir Ahuja , Sandipan Dandapat , Sunayana Sitaram , Monojit Choudhury

Several Experiments on Investigating Pretraining and Knowledge-Enhanced Models for Natural Language Inference

Natural language inference (NLI) is among the most challenging tasks in natural language understanding. Recent work on unsupervised pretraining that leverages unsupervised signals such as language-model and sentence prediction objectives…

Computation and Language · Computer Science 2019-04-30 Tianda Li , Xiaodan Zhu , Quan Liu , Qian Chen , Zhigang Chen , Si Wei

How Predictable Are Large Language Model Capabilities? A Case Study on BIG-bench

We investigate the predictability of large language model (LLM) capabilities: given records of past experiments using different model families, numbers of parameters, tasks, and numbers of in-context examples, can we accurately predict LLM…

Computation and Language · Computer Science 2023-11-01 Qinyuan Ye , Harvey Yiyun Fu , Xiang Ren , Robin Jia

LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language

Machine learning practitioners often face significant challenges in formally integrating their prior knowledge and beliefs into predictive models, limiting the potential for nuanced and context-aware analyses. Moreover, the expertise needed…

Machine Learning · Statistics 2024-12-23 James Requeima , John Bronskill , Dami Choi , Richard E. Turner , David Duvenaud

Probing the Probing Paradigm: Does Probing Accuracy Entail Task Relevance?

Although neural models have achieved impressive results on several NLP benchmarks, little is understood about the mechanisms they use to perform language tasks. Thus, much recent attention has been devoted to analyzing the sentence…

Computation and Language · Computer Science 2021-03-09 Abhilasha Ravichander , Yonatan Belinkov , Eduard Hovy

Predicting Fine-Tuning Performance with Probing

Large NLP models have recently shown impressive performance in language understanding tasks, typically evaluated by their fine-tuned performance. Alternatively, probing has received increasing attention as being a lightweight method for…

Computation and Language · Computer Science 2022-10-17 Zining Zhu , Soroosh Shahtalebi , Frank Rudzicz

The Promises and Pitfalls of Using Language Models to Measure Instruction Quality in Education

Assessing instruction quality is a fundamental component of any improvement efforts in the education system. However, traditional manual assessments are expensive, subjective, and heavily dependent on observers' expertise and idiosyncratic…

Computation and Language · Computer Science 2025-01-03 Paiheng Xu , Jing Liu , Nathan Jones , Julie Cohen , Wei Ai

Quantifying Uncertainties in Natural Language Processing Tasks

Reliable uncertainty quantification is a first step towards building explainable, transparent, and accountable artificial intelligent systems. Recent progress in Bayesian deep learning has made such quantification realizable. In this paper,…

Computation and Language · Computer Science 2018-11-20 Yijun Xiao , William Yang Wang

Show Your Work: Improved Reporting of Experimental Results

Research in natural language processing proceeds, in part, by demonstrating that new models achieve superior performance (e.g., accuracy) on held-out test data, compared to previous results. In this paper, we demonstrate that test-set…

Machine Learning · Computer Science 2019-09-09 Jesse Dodge , Suchin Gururangan , Dallas Card , Roy Schwartz , Noah A. Smith

Predicting Field Experiments with Large Language Models

Large language models (LLMs) have demonstrated unprecedented emergent capabilities, including content generation, translation, and simulation of human behavior. Field experiments, on the other hand, are widely employed in social studies to…

Computers and Society · Computer Science 2025-05-22 Yaoyu Chen , Yuheng Hu , Yingda Lu

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question…

Computation and Language · Computer Science 2021-09-06 Paul Michel

Automating Behavioral Testing in Machine Translation

Behavioral testing in NLP allows fine-grained evaluation of systems by examining their linguistic capabilities through the analysis of input-output behavior. Unfortunately, existing work on behavioral testing in Machine Translation (MT) is…

Computation and Language · Computer Science 2023-11-06 Javier Ferrando , Matthias Sperber , Hendra Setiawan , Dominic Telaar , Saša Hasan

Capturing Human Cognitive Styles with Language: Towards an Experimental Evaluation Paradigm

While NLP models often seek to capture cognitive states via language, the validity of predicted states is determined by comparing them to annotations created without access the cognitive states of the authors. In behavioral sciences,…

Computation and Language · Computer Science 2025-02-20 Vasudha Varadarajan , Syeda Mahwish , Xiaoran Liu , Julia Buffolino , Christian C. Luhmann , Ryan L. Boyd , H. Andrew Schwartz

Reasoning Under Uncertainty: Exploring Probabilistic Reasoning Capabilities of LLMs

Despite widespread success in language understanding and generation, large language models (LLMs) exhibit unclear and often inconsistent behavior when faced with tasks that require probabilistic reasoning. In this work, we present the first…

Computation and Language · Computer Science 2025-09-29 Mobina Pournemat , Keivan Rezaei , Gaurang Sriramanan , Arman Zarei , Jiaxiang Fu , Yang Wang , Hamid Eghbalzadeh , Soheil Feizi