Related papers: MOROCCO: Model Resource Comparison Framework

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

For natural language understanding (NLU) technology to be maximally useful, both practically and as a scientific object of study, it must be general: it must be able to process language in a way that is not exclusively tailored to any one…

Computation and Language · Computer Science 2019-02-26 Alex Wang , Amanpreet Singh , Julian Michael , Felix Hill , Omer Levy , Samuel R. Bowman

On the Evaluation Practices in Multilingual NLP: Can Machine Translation Offer an Alternative to Human Translations?

While multilingual language models (MLMs) have been trained on 100+ languages, they are typically only evaluated across a handful of them due to a lack of available test data in most languages. This is particularly problematic when…

Computation and Language · Computer Science 2024-06-21 Rochelle Choenni , Sara Rajaee , Christof Monz , Ekaterina Shutova

ORCA: A Challenging Benchmark for Arabic Language Understanding

Due to their crucial role in all NLP, several benchmarks have been proposed to evaluate pretrained language models. In spite of these efforts, no public benchmark of diverse nature currently exists for evaluation of Arabic. This makes it…

Computation and Language · Computer Science 2023-05-31 AbdelRahim Elmadany , El Moatez Billah Nagoudi , Muhammad Abdul-Mageed

Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency

This paper presents a comparative study aimed at optimizing Llama2 inference, a critical aspect of machine learning and natural language processing (NLP). We evaluate various programming languages and frameworks, including TensorFlow,…

Machine Learning · Computer Science 2025-02-05 Sazzad Hossain , Touhidul Alam Seyam , Avijit Chowdhury , Munis Xamidov , Rajib Ghose , Abhijit Pathak

Magneto: Combining Small and Large Language Models for Schema Matching

Recent advances in language models opened new opportunities to address complex schema matching tasks. Schema matching approaches have been proposed that demonstrate the usefulness of language models, but they have also uncovered important…

Databases · Computer Science 2025-06-18 Yurong Liu , Eduardo Pena , Aecio Santos , Eden Wu , Juliana Freire

Towards Inclusive NLP: Assessing Compressed Multilingual Transformers across Diverse Language Benchmarks

Although LLMs have attained significant success in high-resource languages, their capacity in low-resource linguistic environments like Kannada and Arabic is not yet fully understood. This work benchmarking the performance of multilingual…

Computation and Language · Computer Science 2025-07-29 Maitha Alshehhi , Ahmed Sharshar , Mohsen Guizani

RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text Matching Models

With the extensive use of vision-language models in various downstream tasks, evaluating their robustness is crucial. In this paper, we propose a benchmark for assessing the robustness of vision-language models. We believe that a robust…

Computer Vision and Pattern Recognition · Computer Science 2024-12-23 Seulki Park , Daeho Um , Hajung Yoon , Sanghyuk Chun , Sangdoo Yun , Jin Young Choi

ChiKhaPo: A Large-Scale Multilingual Benchmark for Evaluating Lexical Comprehension and Generation in Large Language Models

Existing benchmarks for large language models (LLMs) are largely restricted to high- or mid-resource languages, and often evaluate performance on higher-order tasks in reasoning and generation. However, plenty of evidence points to the fact…

Computation and Language · Computer Science 2025-12-01 Emily Chang , Niyati Bafna

MuLD: The Multitask Long Document Benchmark

The impressive progress in NLP techniques has been driven by the development of multi-task benchmarks such as GLUE and SuperGLUE. While these benchmarks focus on tasks for one or two input sentences, there has been exciting work in…

Computation and Language · Computer Science 2025-10-20 G Thomas Hudson , Noura Al Moubayed

Designing a Framework for Solving Multiobjective Simulation Optimization Problems

Multiobjective simulation optimization (MOSO) problems are optimization problems with multiple conflicting objectives, where evaluation of at least one of the objectives depends on a black-box numerical code or real-world experiment, which…

Optimization and Control · Mathematics 2025-01-13 Tyler H. Chang , Stefan M. Wild

WHEN FLUE MEETS FLANG: Benchmarks and Large Pre-trained Language Model for Financial Domain

Pre-trained language models have shown impressive performance on a variety of tasks and domains. Previous research on financial language models usually employs a generic training scheme to train standard model architectures, without…

Computation and Language · Computer Science 2022-11-02 Raj Sanjay Shah , Kunal Chawla , Dheeraj Eidnani , Agam Shah , Wendi Du , Sudheer Chava , Natraj Raman , Charese Smiley , Jiaao Chen , Diyi Yang

Efficient Multi-Objective Optimization through Population-based Parallel Surrogate Search

Multi-Objective Optimization (MOO) is very difficult for expensive functions because most current MOO methods rely on a large number of function evaluations to get an accurate solution. We address this problem with surrogate approximation…

Neural and Evolutionary Computing · Computer Science 2019-03-07 Taimoor Akhtar , Christine A. Shoemaker

COLE: a Comprehensive Benchmark for French Language Understanding Evaluation

To address the need for a more comprehensive evaluation of French Natural Language Understanding (NLU), we introduce COLE, a new benchmark composed of 23 diverse task covering a broad range of NLU capabilities, including sentiment analysis,…

Computation and Language · Computer Science 2025-10-08 David Beauchemin , Yan Tremblay , Mohamed Amine Youssef , Richard Khoury

Low-Resource Machine Translation Training Curriculum Fit for Low-Resource Languages

We conduct an empirical study of neural machine translation (NMT) for truly low-resource languages, and propose a training curriculum fit for cases when both parallel training data and compute resource are lacking, reflecting the reality of…

Computation and Language · Computer Science 2021-11-30 Garry Kuwanto , Afra Feyza Akyürek , Isidora Chara Tourni , Siyang Li , Alexander Gregory Jones , Derry Wijaya

Enterprise Benchmarks for Large Language Model Evaluation

The advancement of large language models (LLMs) has led to a greater challenge of having a rigorous and systematic evaluation of complex tasks performed, especially in enterprise applications. Therefore, LLMs need to be able to benchmark…

Computation and Language · Computer Science 2024-10-18 Bing Zhang , Mikio Takeuchi , Ryo Kawahara , Shubhi Asthana , Md. Maruf Hossain , Guang-Jie Ren , Kate Soule , Yada Zhu

Call Larisa Ivanovna: Code-Switching Fools Multilingual NLU Models

Practical needs of developing task-oriented dialogue assistants require the ability to understand many languages. Novel benchmarks for multilingual natural language understanding (NLU) include monolingual sentences in several languages,…

Computation and Language · Computer Science 2021-11-23 Alexey Birshert , Ekaterina Artemova

Low Resource Neural Machine Translation: A Benchmark for Five African Languages

Recent advents in Neural Machine Translation (NMT) have shown improvements in low-resource language (LRL) translation tasks. In this work, we benchmark NMT between English and five African LRL pairs (Swahili, Amharic, Tigrigna, Oromo,…

Computation and Language · Computer Science 2020-04-01 Surafel M. Lakew , Matteo Negri , Marco Turchi

GLUECoS : An Evaluation Benchmark for Code-Switched NLP

Code-switching is the use of more than one language in the same conversation or utterance. Recently, multilingual contextual embedding models, trained on multiple monolingual corpora, have shown promising results on cross-lingual and…

Computation and Language · Computer Science 2020-05-15 Simran Khanuja , Sandipan Dandapat , Anirudh Srinivasan , Sunayana Sitaram , Monojit Choudhury

A Comparative Study of Lexical Substitution Approaches based on Neural Language Models

Lexical substitution in context is an extremely powerful technology that can be used as a backbone of various NLP applications, such as word sense induction, lexical relation extraction, data augmentation, etc. In this paper, we present a…

Computation and Language · Computer Science 2020-06-02 Nikolay Arefyev , Boris Sheludko , Alexander Podolskiy , Alexander Panchenko

MoNaCo: More Natural and Complex Questions for Reasoning Across Dozens of Documents

Automated agents, powered by Large language models (LLMs), are emerging as the go-to tool for querying information. However, evaluation benchmarks for LLM agents rarely feature natural questions that are both information-seeking and…

Computation and Language · Computer Science 2025-09-04 Tomer Wolfson , Harsh Trivedi , Mor Geva , Yoav Goldberg , Dan Roth , Tushar Khot , Ashish Sabharwal , Reut Tsarfaty