English
Related papers

Related papers: SensorBench: Benchmarking LLMs in Coding-Based Sen…

200 papers

We present a benchmark targeting a novel class of systems: semantic query processing engines. Those systems rely inherently on generative and reasoning capabilities of state-of-the-art large language models (LLMs). They extend SQL with…

Can the rapid advances in code generation, function calling, and data analysis using large language models (LLMs) help automate the search and verification of hypotheses purely from a set of provided datasets? To evaluate this question, we…

Task automation has been greatly empowered by the recent advances in Large Language Models (LLMs) via Python code, where the tasks ranging from software engineering development to general-purpose reasoning. While current benchmarks have…

Large Language Models (LLMs) have emerged as a powerful tool in advancing the Text-to-SQL task, significantly outperforming traditional methods.Nevertheless, as a nascent research field, there is still no consensus on the optimal prompt…

Computation and Language · Computer Science 2026-03-20 Bin Zhang , Yuxiao Ye , Guoqing Du , Xiaoru Hu , Zhishuai Li , Chi Harold Liu , Zhiwei Xu , Guoliang Fan , Rui Zhao , Ziyue Li , Hangyu Mao

Recently, Large Language Models (LLM) have demonstrated impressive capability to solve a wide range of tasks. However, despite their success across various tasks, no prior work has investigated their capability in the biomedical domain yet.…

Computation and Language · Computer Science 2024-02-21 Israt Jahan , Md Tahmid Rahman Laskar , Chun Peng , Jimmy Huang

Large language model (LLM) simulations of human behavior have the potential to revolutionize the social and behavioral sciences, if and only if they faithfully reflect real human behaviors. Current evaluations of simulation fidelity are…

Computation and Language · Computer Science 2026-04-14 Tiancheng Hu , Joachim Baumann , Lorenzo Lupo , Nigel Collier , Dirk Hovy , Paul Röttger

We introduce SimulBench, a benchmark designed to evaluate large language models (LLMs) across a diverse collection of creative simulation scenarios, such as acting as a Linux terminal or playing text games with users. While these simulation…

Computation and Language · Computer Science 2024-09-13 Qi Jia , Xiang Yue , Tianyu Zheng , Jie Huang , Bill Yuchen Lin

Large language models (LLMs) can often generate functionally correct code, but their ability to produce efficient implementations for performance-critical systems tasks remains limited. Existing code benchmarks mainly emphasize correctness…

Software Engineering · Computer Science 2026-05-18 Huihao Jing , Wenbin Hu , Haochen Shi , Hanyu Yang , Sirui Zhang , Shaojin Chen , Haoran Li , Yangqiu Song

We present a new approach for benchmarking Large Language Model (LLM) capabilities on research-level mathematics. Existing benchmarks largely rely on static, hand-curated sets of contest or textbook-style problems as proxies for…

Artificial Intelligence · Computer Science 2026-03-02 Antoine Peyronnet , Fabian Gloeckle , Amaury Hayat

As large language models (LLMs) become integral to code-related tasks, a central question emerges: Do LLMs truly understand program semantics? We introduce EquiBench, a new benchmark for evaluating LLMs through equivalence checking, i.e.,…

Machine Learning · Computer Science 2025-09-23 Anjiang Wei , Jiannan Cao , Ran Li , Hongyu Chen , Yuhui Zhang , Ziheng Wang , Yuan Liu , Thiago S. F. X. Teixeira , Diyi Yang , Ke Wang , Alex Aiken

Large Language Models (LLMs) have the potential to semi-automate some process mining (PM) analyses. While commercial models are already adequate for many analytics tasks, the competitive level of open-source LLMs in PM tasks is unknown. In…

Computation and Language · Computer Science 2024-07-19 Alessandro Berti , Humam Kourani , Wil M. P. van der Aalst

Most of the existing Large Language Model (LLM) benchmarks on scientific problem reasoning focus on problems grounded in high-school subjects and are confined to elementary algebraic operations. To systematically examine the reasoning…

Computation and Language · Computer Science 2024-07-01 Xiaoxuan Wang , Ziniu Hu , Pan Lu , Yanqiao Zhu , Jieyu Zhang , Satyen Subramaniam , Arjun R. Loomba , Shichang Zhang , Yizhou Sun , Wei Wang

Although large language models (LLMs) have demonstrated their strong intelligence ability, the high demand for computation and storage hinders their practical application. To this end, many model compression techniques are proposed to…

Computation and Language · Computer Science 2024-11-01 Ge Yang , Changyi He , Jinyang Guo , Jianyu Wu , Yifu Ding , Aishan Liu , Haotong Qin , Pengliang Ji , Xianglong Liu

Large Language Models (LLMs) have made significant strides in front-end code generation. However, existing benchmarks exhibit several critical limitations: many tasks are overly simplistic, test cases often lack rigor, and end-to-end…

Software Engineering · Computer Science 2025-06-19 Hongda Zhu , Yiwen Zhang , Bing Zhao , Jingzhe Ding , Siyao Liu , Tong Liu , Dandan Wang , Yanan Liu , Zhaojian Li

Large Language Models (LLMs) are increasingly excelling and outpacing human performance on many tasks. However, to improve LLM reasoning, researchers either rely on ad-hoc generated datasets or formal mathematical proof systems such as the…

Artificial Intelligence · Computer Science 2025-11-03 Nikolaus Holzer , William Fishell , Baishakhi Ray , Mark Santolucito

In contrast to their remarkable performance on general knowledge QA, the true abilities of Large Language Models (LLMs) in tasks demanding deep, specialized reasoning, such as in protein biology, have yet to be thoroughly investigated.…

Quantitative Methods · Quantitative Biology 2025-12-30 Dingyi Rong , Zijian Chen , Qi Jia , Kaiwei Zhang , Haotian Lu , Guangtao Zhai , Ning Liu

Large Language Models (LLMs) have become instrumental across various applications, with the customization of these models to specific scenarios becoming increasingly critical. System message, a fundamental component of LLMs, is consist of…

Computation and Language · Computer Science 2024-10-23 Yanzhao Qin , Tao Zhang , Tao Zhang , Yanjun Shen , Wenjing Luo , Haoze Sun , Yan Zhang , Yujing Qiao , Weipeng Chen , Zenan Zhou , Wentao Zhang , Bin Cui

Multimodal LLMs (MLLMs) are capable of performing complex data analysis, visual question answering, generation, and reasoning tasks. However, their ability to analyze biometric data is relatively underexplored. In this work, we investigate…

Computer Vision and Pattern Recognition · Computer Science 2026-04-14 Ekta Gavas , Sudipta Banerjee , Chinmay Hegde , Nasir Memon

Large language models (LLMs) have demonstrated significant potential in advancing various fields of research and society. However, the current community of LLMs overly focuses on benchmarks for analyzing specific foundational skills (e.g.…

Recently, there has been a growing interest among large language model (LLM) developers in LLM-based document reading systems, which enable users to upload their own documents and pose questions related to the document contents, going…

Computation and Language · Computer Science 2024-07-16 Anni Zou , Wenhao Yu , Hongming Zhang , Kaixin Ma , Deng Cai , Zhuosheng Zhang , Hai Zhao , Dong Yu
‹ Prev 1 2 3 10 Next ›