Related papers: CrossCodeBench: Benchmarking Cross-Task Generaliza…

Cross-Task Generalization via Natural Language Crowdsourcing Instructions

Humans (e.g., crowdworkers) have a remarkable ability in solving different tasks, by simply reading textual instructions that define them and looking at a few examples. Despite the success of the conventional supervised learning on…

Computation and Language · Computer Science 2022-03-15 Swaroop Mishra , Daniel Khashabi , Chitta Baral , Hannaneh Hajishirzi

Cross-Learning from Scarce Data via Multi-Task Constrained Optimization

A learning task, understood as the problem of fitting a parametric model from supervised data, fundamentally requires the dataset to be large enough to be representative of the underlying distribution of the source. When data is limited,…

Machine Learning · Computer Science 2025-11-18 Leopoldo Agorio , Juan Cerviño , Miguel Calvo-Fullana , Alejandro Ribeiro , Juan Andrés Bazerque

Bridging Pre-trained Models and Downstream Tasks for Source Code Understanding

With the great success of pre-trained models, the pretrain-then-finetune paradigm has been widely adopted on downstream tasks for source code understanding. However, compared to costly training a large-scale model from scratch, how to…

Software Engineering · Computer Science 2022-03-16 Deze Wang , Zhouyang Jia , Shanshan Li , Yue Yu , Yun Xiong , Wei Dong , Xiangke Liao

Are We Building on the Rock? On the Importance of Data Preprocessing for Code Summarization

Code summarization, the task of generating useful comments given the code, has long been of interest. Most of the existing code summarization models are trained and validated on widely-used code comment benchmark datasets. However, little…

Software Engineering · Computer Science 2022-10-18 Lin Shi , Fangwen Mu , Xiao Chen , Song Wang , Junjie Wang , Ye Yang , Ge Li , Xin Xia , Qing Wang

CrossPL: Evaluating Large Language Models on Cross Programming Language Code Generation

As large language models (LLMs) become increasingly embedded in software engineering workflows, a critical capability remains underexplored: generating correct code that enables cross-programming-language (CPL) interoperability. This skill…

Software Engineering · Computer Science 2025-07-29 Zhanhang Xiong , Dongxia Wang , Yuekang Li , Xinyuan An , Wenhai Wang

Improving Cross-task Generalization of Unified Table-to-text Models with Compositional Task Configurations

There has been great progress in unifying various table-to-text tasks using a single encoder-decoder model trained via multi-task learning (Xie et al., 2022). However, existing methods typically encode task information with a simple dataset…

Computation and Language · Computer Science 2022-12-20 Jifan Chen , Yuhao Zhang , Lan Liu , Rui Dong , Xinchi Chen , Patrick Ng , William Yang Wang , Zhiheng Huang

Data-Efficient Finetuning Using Cross-Task Nearest Neighbors

Obtaining labeled data to train a model for a task of interest is often expensive. Prior work shows training models on multitask data augmented with task descriptions (prompts) effectively transfers knowledge to new tasks. Towards…

Computation and Language · Computer Science 2023-05-26 Hamish Ivison , Noah A. Smith , Hannaneh Hajishirzi , Pradeep Dasigi

Generalization Beyond Benchmarks: Evaluating Learnable Protein-Ligand Scoring Functions on Unseen Targets

As machine learning becomes increasingly central to molecular design, it is vital to ensure the reliability of learnable protein-ligand scoring functions on novel protein targets. While many scoring functions perform well on standard…

Machine Learning · Computer Science 2025-12-08 Jakub Kopko , David Graber , Saltuk Mustafa Eyrilmez , Stanislav Mazurenko , David Bednar , Jiri Sedlar , Josef Sivic

On the Generalization Ability of Unsupervised Pretraining

Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization. However, a rigorous understanding of how the representation function learned on an unlabeled…

Machine Learning · Computer Science 2024-03-12 Yuyang Deng , Junyuan Hong , Jiayu Zhou , Mehrdad Mahdavi

Quantifying Contamination in Evaluating Code Generation Capabilities of Language Models

While large language models have achieved remarkable performance on various code generation benchmarks, there have been growing concerns regarding potential contamination of these benchmarks as they may be leaked into pretraining and…

Software Engineering · Computer Science 2024-03-11 Martin Riddell , Ansong Ni , Arman Cohan

RoboCoder: Robotic Learning from Basic Skills to General Tasks with Large Language Models

The emergence of Large Language Models (LLMs) has improved the prospects for robotic tasks. However, existing benchmarks are still limited to single tasks with limited generalization capabilities. In this work, we introduce a comprehensive…

Robotics · Computer Science 2024-06-07 Jingyao Li , Pengguang Chen , Sitong Wu , Chuanyang Zheng , Hong Xu , Jiaya Jia

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Task automation has been greatly empowered by the recent advances in Large Language Models (LLMs) via Python code, where the tasks ranging from software engineering development to general-purpose reasoning. While current benchmarks have…

Software Engineering · Computer Science 2025-04-02 Terry Yue Zhuo , Minh Chien Vu , Jenny Chim , Han Hu , Wenhao Yu , Ratnadira Widyasari , Imam Nur Bani Yusuf , Haolan Zhan , Junda He , Indraneil Paul , Simon Brunner , Chen Gong , Thong Hoang , Armel Randy Zebaze , Xiaoheng Hong , Wen-Ding Li , Jean Kaddour , Ming Xu , Zhihan Zhang , Prateek Yadav , Naman Jain , Alex Gu , Zhoujun Cheng , Jiawei Liu , Qian Liu , Zijian Wang , Binyuan Hui , Niklas Muennighoff , David Lo , Daniel Fried , Xiaoning Du , Harm de Vries , Leandro Von Werra

Task Matrices: Linear Maps for Cross-Model Finetuning Transfer

Results in interpretability suggest that large vision and language models learn implicit linear encodings when models are biased by in-context prompting. However, the existence of similar linear representations in more general adaptation…

Machine Learning · Computer Science 2025-12-18 Darrin O' Brien , Dhikshith Gajulapalli , Eric Xia

Unlocking Multi-Task Electric Energy System Intelligence: Data Scaling Laws and Performance with Limited Fine-Tuning

Data scaling has revolutionized research fields like natural language processing, computer vision, and robotics control, providing foundation models with remarkable multi-task and generalization capabilities. In this paper, we investigate…

Systems and Control · Electrical Eng. & Systems 2025-03-27 Shaohuai Liu , Lin Dong , Chao Tian , Le Xie

How Does Code Pretraining Affect Language Model Task Performance?

Large language models are increasingly trained on corpora containing both natural language and non-linguistic data like source code. Aside from aiding programming-related tasks, anecdotal evidence suggests that including code in pretraining…

Computation and Language · Computer Science 2025-02-26 Jackson Petty , Sjoerd van Steenkiste , Tal Linzen

PerfCodeBench: Benchmarking LLMs for System-Level High-Performance Code Optimization

Large language models (LLMs) can often generate functionally correct code, but their ability to produce efficient implementations for performance-critical systems tasks remains limited. Existing code benchmarks mainly emphasize correctness…

Software Engineering · Computer Science 2026-05-18 Huihao Jing , Wenbin Hu , Haochen Shi , Hanyu Yang , Sirui Zhang , Shaojin Chen , Haoran Li , Yangqiu Song

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-reinforcement learning algorithms can enable robots to acquire new skills much more quickly, by leveraging prior experience to learn how to learn. However, much of the current research on meta-reinforcement learning focuses on task…

Machine Learning · Computer Science 2021-06-16 Tianhe Yu , Deirdre Quillen , Zhanpeng He , Ryan Julian , Avnish Narayan , Hayden Shively , Adithya Bellathur , Karol Hausman , Chelsea Finn , Sergey Levine

Improving Cross-Task Generalization with Step-by-Step Instructions

Instruction tuning has been shown to be able to improve cross-task generalization of language models. However, it is still challenging for language models to complete the target tasks following the instructions, as the instructions are…

Computation and Language · Computer Science 2023-05-09 Yang Wu , Yanyan Zhao , Zhongyang Li , Bing Qin , Kai Xiong

FrontendBench: A Benchmark for Evaluating LLMs on Front-End Development via Automatic Evaluation

Large Language Models (LLMs) have made significant strides in front-end code generation. However, existing benchmarks exhibit several critical limitations: many tasks are overly simplistic, test cases often lack rigor, and end-to-end…

Software Engineering · Computer Science 2025-06-19 Hongda Zhu , Yiwen Zhang , Bing Zhao , Jingzhe Ding , Siyao Liu , Tong Liu , Dandan Wang , Yanan Liu , Zhaojian Li

Multi-task Supervised Learning via Cross-learning

In this paper we consider a problem known as multi-task learning, consisting of fitting a set of classifier or regression functions intended for solving different tasks. In our novel formulation, we couple the parameters of these functions,…

Machine Learning · Computer Science 2021-05-28 Juan Cervino , Juan Andres Bazerque , Miguel Calvo-Fullana , Alejandro Ribeiro