Related papers: Quantifying Generalization Complexity for Large La…

Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data

The impressive capabilities of large language models (LLMs) have sparked debate over whether these models genuinely generalize to unseen tasks or predominantly rely on memorizing vast amounts of pretraining data. To explore this issue, we…

Computation and Language · Computer Science 2025-03-04 Xinyi Wang , Antonis Antoniades , Yanai Elazar , Alfonso Amayuelas , Alon Albalak , Kexun Zhang , William Yang Wang

Revisiting Generalization Across Difficulty Levels: It's Not So Easy

We investigate how well large language models (LLMs) generalize across different task difficulties, a key question for effective data curation and evaluation. Existing research is mixed regarding whether training on easier or harder data…

Computation and Language · Computer Science 2025-11-27 Yeganeh Kordi , Nihal V. Nayak , Max Zuo , Ilana Nguyen , Stephen H. Bach

Compute-Optimal LLMs Provably Generalize Better With Scale

Why do larger language models generalize better? To investigate this question, we develop generalization bounds on the pretraining objective of large language models (LLMs) in the compute-optimal regime, as described by the Chinchilla…

Machine Learning · Computer Science 2025-04-22 Marc Finzi , Sanyam Kapoor , Diego Granziol , Anming Gu , Christopher De Sa , J. Zico Kolter , Andrew Gordon Wilson

KoLA: Carefully Benchmarking World Knowledge of Large Language Models

The unprecedented performance of large language models (LLMs) necessitates improvements in evaluations. Rather than merely exploring the breadth of LLM abilities, we believe meticulous and thoughtful designs are essential to thorough,…

Computation and Language · Computer Science 2024-07-02 Jifan Yu , Xiaozhi Wang , Shangqing Tu , Shulin Cao , Daniel Zhang-Li , Xin Lv , Hao Peng , Zijun Yao , Xiaohan Zhang , Hanming Li , Chunyang Li , Zheyuan Zhang , Yushi Bai , Yantao Liu , Amy Xin , Nianyi Lin , Kaifeng Yun , Linlu Gong , Jianhui Chen , Zhili Wu , Yunjia Qi , Weikai Li , Yong Guan , Kaisheng Zeng , Ji Qi , Hailong Jin , Jinxin Liu , Yu Gu , Yuan Yao , Ning Ding , Lei Hou , Zhiyuan Liu , Bin Xu , Jie Tang , Juanzi Li

KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models

Large language models (LLMs) demonstrate remarkable performance on knowledge-intensive tasks, suggesting that real-world knowledge is encoded in their model parameters. However, besides explorations on a few probing tasks in limited…

Computation and Language · Computer Science 2024-03-26 Yuyang Bai , Shangbin Feng , Vidhisha Balachandran , Zhaoxuan Tan , Shiqi Lou , Tianxing He , Yulia Tsvetkov

Evaluating LLMs Across Multi-Cognitive Levels: From Medical Knowledge Mastery to Scenario-Based Problem Solving

Large language models (LLMs) have demonstrated remarkable performance on various medical benchmarks, but their capabilities across different cognitive levels remain underexplored. Inspired by Bloom's Taxonomy, we propose a…

Computation and Language · Computer Science 2025-06-11 Yuxuan Zhou , Xien Liu , Chenwei Yan , Chen Ning , Xiao Zhang , Boxun Li , Xiangling Fu , Shijin Wang , Guoping Hu , Yu Wang , Ji Wu

SciDA: Scientific Dynamic Assessor of LLMs

Advancement in Large Language Models (LLMs) reasoning capabilities enables them to solve scientific problems with enhanced efficacy. Thereby, a high-quality benchmark for comprehensive and appropriate assessment holds significance, while…

Computation and Language · Computer Science 2025-06-17 Junting Zhou , Tingjia Miao , Yiyan Liao , Qichao Wang , Zhoufutu Wen , Yanqin Wang , Yunjie Huang , Ge Yan , Leqi Wang , Yucheng Xia , Hongwan Gao , Yuansong Zeng , Renjie Zheng , Chen Dun , Yitao Liang , Tong Yang , Wenhao Huang , Ge Zhang

Cross-Task Benchmarking and Evaluation of General-Purpose and Code-Specific Large Language Models

Large Language Models (LLMs) have revolutionized both general natural language processing and domain-specific applications such as code synthesis, legal reasoning, and finance. However, while prior studies have explored individual model…

Software Engineering · Computer Science 2025-12-05 Gunjan Das , Paheli Bhattacharya , Rishabh Gupta

QUILL: Quotation Generation Enhancement of Large Language Models

While Large language models (LLMs) have become excellent writing assistants, they still struggle with quotation generation. This is because they either hallucinate when providing factual quotations or fail to provide quotes that exceed…

Computation and Language · Computer Science 2025-02-21 Jin Xiao , Bowei Zhang , Qianyu He , Jiaqing Liang , Feng Wei , Jinglei Chen , Zujie Liang , Deqing Yang , Yanghua Xiao

Evaluating the Generalization Capabilities of Large Language Models on Code Reasoning

We assess how the code reasoning abilities of large language models (LLMs) generalize to different kinds of programs. We present techniques for obtaining in- and out-of-distribution programs with different characteristics: code sampled from…

Software Engineering · Computer Science 2025-04-09 Rem Yang , Julian Dai , Nikos Vasilakis , Martin Rinard

In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax

In-context learning (ICL) is now a common method for teaching large language models (LLMs) new tasks: given labeled examples in the input context, the LLM learns to perform the task without weight updates. Do models guided via ICL infer the…

Computation and Language · Computer Science 2024-04-11 Aaron Mueller , Albert Webson , Jackson Petty , Tal Linzen

A Comprehensive Evaluation of Quantization Strategies for Large Language Models

Increasing the number of parameters in large language models (LLMs) usually improves performance in downstream tasks but raises compute and memory costs, making deployment difficult in resource-limited settings. Quantization techniques,…

Computation and Language · Computer Science 2024-06-07 Renren Jin , Jiangcun Du , Wuwei Huang , Wei Liu , Jian Luan , Bin Wang , Deyi Xiong

Challenging the Abilities of Large Language Models in Italian: a Community Initiative

The rapid progress of Large Language Models (LLMs) has transformed natural language processing and broadened its impact across research and society. Yet, systematic evaluation of these models, especially for languages beyond English,…

Computation and Language · Computer Science 2025-12-05 Malvina Nissim , Danilo Croce , Viviana Patti , Pierpaolo Basile , Giuseppe Attanasio , Elio Musacchio , Matteo Rinaldi , Federico Borazio , Maria Francis , Jacopo Gili , Daniel Scalena , Begoña Altuna , Ekhi Azurmendi , Valerio Basile , Luisa Bentivogli , Arianna Bisazza , Marianna Bolognesi , Dominique Brunato , Tommaso Caselli , Silvia Casola , Maria Cassese , Mauro Cettolo , Claudia Collacciani , Leonardo De Cosmo , Maria Pia Di Buono , Andrea Esuli , Julen Etxaniz , Chiara Ferrando , Alessia Fidelangeli , Simona Frenda , Achille Fusco , Marco Gaido , Andrea Galassi , Federico Galli , Luca Giordano , Mattia Goffetti , Itziar Gonzalez-Dios , Lorenzo Gregori , Giulia Grundler , Sandro Iannaccone , Chunyang Jiang , Moreno La Quatra , Francesca Lagioia , Soda Marem Lo , Marco Madeddu , Bernardo Magnini , Raffaele Manna , Fabio Mercorio , Paola Merlo , Arianna Muti , Vivi Nastase , Matteo Negri , Dario Onorati , Elena Palmieri , Sara Papi , Lucia Passaro , Giulia Pensa , Andrea Piergentili , Daniele Potertì , Giovanni Puccetti , Federico Ranaldi , Leonardo Ranaldi , Andrea Amelio Ravelli , Martina Rosola , Elena Sofia Ruzzetti , Giuseppe Samo , Andrea Santilli , Piera Santin , Gabriele Sarti , Giovanni Sartor , Beatrice Savoldi , Antonio Serino , Andrea Seveso , Lucia Siciliani , Paolo Torroni , Rossella Varvara , Andrea Zaninello , Asya Zanollo , Fabio Massimo Zanzotto , Kamyar Zeinalipour , Andrea Zugarini

How and Why LLMs Generalize: A Fine-Grained Analysis of LLM Reasoning from Cognitive Behaviors to Low-Level Patterns

Large Language Models (LLMs) display strikingly different generalization behaviors: supervised fine-tuning (SFT) often narrows capability, whereas reinforcement-learning (RL) tuning tends to preserve it. The reasons behind this divergence…

Machine Learning · Computer Science 2026-01-01 Haoyue Bai , Yiyou Sun , Wenjie Hu , Shi Qiu , Maggie Ziyu Huan , Peiyang Song , Robert Nowak , Dawn Song

Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering

We introduce KoLasSimpleQA, the first benchmark evaluating the multilingual factual ability of Large Language Models (LLMs). Inspired by existing research, we created the question set with features such as single knowledge point coverage,…

Computation and Language · Computer Science 2025-05-23 Bowen Jiang , Runchuan Zhu , Jiang Wu , Zinco Jiang , Yifan He , Junyuan Gao , Jia Yu , Rui Min , Yinfan Wang , Haote Yang , Songyang Zhang , Dahua Lin , Lijun Wu , Conghui He

Are Large Language Models Good Statisticians?

Large Language Models (LLMs) have demonstrated impressive capabilities across a range of scientific tasks including mathematics, physics, and chemistry. Despite their successes, the effectiveness of LLMs in handling complex statistical…

Computation and Language · Computer Science 2024-10-11 Yizhang Zhu , Shiyin Du , Boyan Li , Yuyu Luo , Nan Tang

Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks

Large Language Models (LLMs) are advancing at an amazing speed and have become indispensable across academia, industry, and daily applications. To keep pace with the status quo, this survey probes the core challenges that the rise of LLMs…

Computation and Language · Computer Science 2025-04-29 Yixin Cao , Shibo Hong , Xinze Li , Jiahao Ying , Yubo Ma , Haiyuan Liang , Yantao Liu , Zijun Yao , Xiaozhi Wang , Dan Huang , Wenxuan Zhang , Lifu Huang , Muhao Chen , Lei Hou , Qianru Sun , Xingjun Ma , Zuxuan Wu , Min-Yen Kan , David Lo , Qi Zhang , Heng Ji , Jing Jiang , Juanzi Li , Aixin Sun , Xuanjing Huang , Tat-Seng Chua , Yu-Gang Jiang

On Path to Multimodal Generalist: General-Level and General-Bench

The Multimodal Large Language Model (MLLM) is currently experiencing rapid growth, driven by the advanced capabilities of LLMs. Unlike earlier specialists, existing MLLMs are evolving towards a Multimodal Generalist paradigm. Initially…

Computer Vision and Pattern Recognition · Computer Science 2025-05-08 Hao Fei , Yuan Zhou , Juncheng Li , Xiangtai Li , Qingshan Xu , Bobo Li , Shengqiong Wu , Yaoting Wang , Junbao Zhou , Jiahao Meng , Qingyu Shi , Zhiyuan Zhou , Liangtao Shi , Minghe Gao , Daoan Zhang , Zhiqi Ge , Weiming Wu , Siliang Tang , Kaihang Pan , Yaobo Ye , Haobo Yuan , Tao Zhang , Tianjie Ju , Zixiang Meng , Shilin Xu , Liyu Jia , Wentao Hu , Meng Luo , Jiebo Luo , Tat-Seng Chua , Shuicheng Yan , Hanwang Zhang

Benchmarking Information Retrieval Models on Complex Retrieval Tasks

Large language models (LLMs) are incredible and versatile tools for text-based tasks that have enabled countless, previously unimaginable, applications. Retrieval models, in contrast, have not yet seen such capable general-purpose models…

Information Retrieval · Computer Science 2025-09-10 Julian Killingback , Hamed Zamani

LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit

Recent advancements in large language models (LLMs) are propelling us toward artificial general intelligence with their remarkable emergent abilities and reasoning capabilities. However, the substantial computational and memory requirements…

Machine Learning · Computer Science 2024-10-10 Ruihao Gong , Yang Yong , Shiqiao Gu , Yushi Huang , Chengtao Lv , Yunchen Zhang , Xianglong Liu , Dacheng Tao