English
Related papers

Related papers: Measuring Massive Multitask Language Understanding

200 papers

The development of large-scale Chinese language models is flourishing, yet there is a lack of corresponding capability assessments. Therefore, we propose a test to measure the multitask accuracy of large Chinese language models. This test…

Computation and Language · Computer Science 2026-05-28 Hui Zeng

Large language models have recently made tremendous progress in a variety of aspects, e.g., cross-task generalization, instruction following. Comprehensively evaluating the capability of large language models in multiple tasks is of great…

Computation and Language · Computer Science 2023-05-23 Chuang Liu , Renren Jin , Yuqi Ren , Linhao Yu , Tianyu Dong , Xiaohan Peng , Shuting Zhang , Jianxiang Peng , Peiyi Zhang , Qingqing Lyu , Xiaowen Su , Qun Liu , Deyi Xiong

We present a new challenge to examine whether large language models understand social norms. In contrast to existing datasets, our dataset requires a fundamental understanding of social norms to solve. Our dataset features the largest set…

Computation and Language · Computer Science 2024-05-24 Ye Yuan , Kexin Tang , Jianhao Shen , Ming Zhang , Chenguang Wang

This study aims to explore the performance improvement method of large language models based on GPT-4 under the multi-task learning framework and conducts experiments on two tasks: text classification and automatic summary generation.…

Computation and Language · Computer Science 2024-12-10 Zhen Qi , Jiajing Chen , Shuo Wang , Bingying Liu , Hongye Zheng , Chihang Wang

Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires…

Recently, Language Models (LMs) instruction-tuned on multiple tasks, also known as multitask-prompted fine-tuning (MT), have shown the capability to generalize to unseen tasks. Previous work has shown that scaling the number of training…

Computation and Language · Computer Science 2023-02-10 Joel Jang , Seungone Kim , Seonghyeon Ye , Doyoung Kim , Lajanugen Logeswaran , Moontae Lee , Kyungjae Lee , Minjoon Seo

Large language models (LLMs), typically designed as a function of next-word prediction, have excelled across extensive NLP tasks. Despite the generality, next-word prediction is often not an efficient formulation for many of the tasks,…

Computation and Language · Computer Science 2023-11-03 Yuheng Zha , Yichi Yang , Ruichen Li , Zhiting Hu

We propose a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics. We crafted questions that…

Computation and Language · Computer Science 2022-05-10 Stephanie Lin , Jacob Hilton , Owain Evans

Transformer language models have received widespread public attention, yet their generated text is often surprising even to NLP researchers. In this survey, we discuss over 250 recent studies of English language model behavior before…

Computation and Language · Computer Science 2023-08-29 Tyler A. Chang , Benjamin K. Bergen

In this paper, we propose to study language modelling as a multi-task problem, bringing together three strands of research: multi-task learning, linguistics, and interpretability. Based on hypotheses derived from linguistic theory, we…

Computation and Language · Computer Science 2021-01-28 Lucas Weber , Jaap Jumelet , Elia Bruni , Dieuwke Hupkes

At the staggering pace with which the capabilities of large language models (LLMs) are increasing, creating future-proof evaluation sets to assess their understanding becomes more and more challenging. In this paper, we propose a novel…

Computation and Language · Computer Science 2023-12-21 Xenia Ohmer , Elia Bruni , Dieuwke Hupkes

Item difficulty plays a crucial role in test performance, interpretability of scores, and equity for all test-takers, especially in large-scale assessments. Traditional approaches to item difficulty modeling rely on field testing and…

Computation and Language · Computer Science 2025-09-30 Sydney Peters , Nan Zhang , Hong Jiao , Ming Li , Tianyi Zhou , Robert Lissitz

We propose a new benchmark evaluating the performance of multimodal large language models on rebus puzzles. The dataset covers 333 original examples of image-based wordplay, cluing 13 categories such as movies, composers, major cities, and…

Large Language Models (LLMs) have the impressive ability to perform in-context learning (ICL) from only a few examples, but the success of ICL varies widely from task to task. Thus, it is important to quickly determine whether ICL is…

Computation and Language · Computer Science 2023-10-27 Harvey Yiyun Fu , Qinyuan Ye , Albert Xu , Xiang Ren , Robin Jia

In this work we present a Mixture of Task-Aware Experts Network for Machine Reading Comprehension on a relatively small dataset. We particularly focus on the issue of common-sense learning, enforcing the common ground knowledge by…

Computation and Language · Computer Science 2022-10-05 Anirudha Rayasam , Anusha Kamath , Gabriel Bayomi Tinoco Kalejaiye

Large language models (LLMs) demonstrate impressive capabilities in mathematical reasoning. However, despite these achievements, current evaluations are mostly limited to specific mathematical topics, and it remains unclear whether LLMs are…

Computation and Language · Computer Science 2025-04-01 Arash Gholami Davoodi , Seyed Pouyan Mousavi Davoudi , Pouya Pezeshkpour

Most Reading Comprehension methods limit themselves to queries which can be answered using a single sentence, paragraph, or document. Enabling models to combine disjoint pieces of textual evidence would extend the scope of machine…

Computation and Language · Computer Science 2018-06-12 Johannes Welbl , Pontus Stenetorp , Sebastian Riedel

With the rise of generative language models, machine-generated text detection has become a critical challenge. A wide variety of models is available, but inconsistent datasets, evaluation metrics, and assessment strategies obscure…

Computation and Language · Computer Science 2026-04-23 Kevin Stowe , Kailash Patil

Language students are most engaged while reading texts at an appropriate difficulty level. However, existing methods of evaluating text difficulty focus mainly on vocabulary and do not prioritize grammatical features, hence they do not work…

Computation and Language · Computer Science 2017-02-17 Shuhan Wang , Erik Andersen

The widespread adoption of large language models (LLMs) makes it important to recognize their strengths and limitations. We argue that in order to develop a holistic understanding of these systems we need to consider the problem that they…

Computation and Language · Computer Science 2023-09-26 R. Thomas McCoy , Shunyu Yao , Dan Friedman , Matthew Hardy , Thomas L. Griffiths
‹ Prev 1 2 3 10 Next ›