English
Related papers

Related papers: QuantumBench: A Benchmark for Quantum Problem Solv…

200 papers

Quantum computing is an emerging field recognized for the significant speedup it offers over classical computing through quantum algorithms. However, designing and implementing quantum algorithms pose challenges due to the complex nature of…

Quantum Physics · Physics 2025-12-17 Rui Yang , Ziruo Wang , Yuntian Gu , Tianyi Chen , Yitao Liang , Tongyang Li

Large language models (LLMs) have demonstrated good performance in general code generation; however, their capabilities in quantum code generation remain insufficiently studied. This paper presents QuanBench, a benchmark for evaluating LLMs…

Software Engineering · Computer Science 2025-10-21 Xiaoyu Guo , Minggu Wang , Jianjun Zhao

While Large Language Models (LLMs) excel on standardized medical exams, high scores often fail to translate to high-quality responses for real-world medical queries. Current evaluations rely heavily on multiple-choice questions, failing to…

The adoption of large language models (LLMs) to assist clinicians has attracted remarkable attention. Existing works mainly adopt the close-ended question-answering (QA) task with answer options for evaluation. However, many clinical…

We introduce QMBench, a comprehensive benchmark designed to evaluate the capability of large language model agents in quantum materials research. This specialized benchmark assesses the model's ability to apply condensed matter physics…

As large language models (LLMs) continue to advance, the need for up-to-date and well-organized benchmarks becomes increasingly critical. However, many existing datasets are scattered, difficult to manage, and make it challenging to perform…

Machine Learning · Computer Science 2025-06-03 Eunsu Kim , Haneul Yoo , Guijin Son , Hitesh Patel , Amit Agarwal , Alice Oh

As LLMs have become increasingly popular, they have been used in almost every field. But as the application for LLMs expands from generic fields to narrow, focused science domains, there exists an ever-increasing gap in ways to evaluate…

Computation and Language · Computer Science 2023-10-18 Anurag Acharya , Sai Munikoti , Aaron Hellinger , Sara Smith , Sridevi Wagle , Sameera Horawalavithana

Quantitative chemistry is central to modern chemical research, yet the ability of large language models (LLMs) to perform its rigorous, step-by-step calculations remains underexplored. To fill this blank, we propose QCBench, a Quantitative…

Artificial Intelligence · Computer Science 2025-11-05 Jiaqing Xie , Weida Wang , Ben Gao , Zhuo Yang , Haiyuan Wan , Shufei Zhang , Tianfan Fu , Yuqiang Li

Large language models (LLMs) have demonstrated significant potential in advancing various fields of research and society. However, the current community of LLMs overly focuses on benchmarks for analyzing specific foundational skills (e.g.…

Recent advances in Large Language Models (LLMs) have demonstrated strong potential in code generation, yet their effectiveness in quantum computing remains underexplored. This paper benchmarks LLMs for PennyLane-based quantum code…

Artificial Intelligence · Computer Science 2025-09-01 Abdul Basit , Minghao Shao , Muhammad Haider Asif , Nouhaila Innan , Muhammad Kashif , Alberto Marchisio , Muhammad Shafique

Recent advancements in Large Language Models (LLMs) have markedly enhanced the interpretation and processing of tabular data, introducing previously unimaginable capabilities. Despite these achievements, LLMs still encounter significant…

Computation and Language · Computer Science 2025-03-19 Xianjie Wu , Jian Yang , Linzheng Chai , Ge Zhang , Jiaheng Liu , Xinrun Du , Di Liang , Daixin Shu , Xianfu Cheng , Tianzhen Sun , Guanglin Niu , Tongliang Li , Zhoujun Li

We present INTEGRALBENCH, a focused benchmark designed to evaluate Large Language Model (LLM) performance on definite integral problems. INTEGRALBENCH provides both symbolic and numerical ground truth solutions with manual difficulty…

Artificial Intelligence · Computer Science 2025-07-30 Bintao Tang , Xin Yang , Yuhao Wang , Zixuan Qiu , Zimo Ji , Wenyuan Jiang

Medical question answering (QA) benchmarks often focus on multiple-choice or fact-based tasks, leaving open-ended answers to real patient questions underexplored. This gap is particularly critical in mental health, where patient questions…

Computation and Language · Computer Science 2026-05-15 Yahan Li , Jifan Yao , John Bosco S. Bunyi , Adam C. Frank , Angel Hsing-Chi Hwang , Ruishan Liu

Recent advancements in large language models (LLMs) have showcased significant improvements in mathematics. However, traditional math benchmarks like GSM8k offer a unidimensional perspective, falling short in providing a holistic assessment…

Computation and Language · Computer Science 2024-05-21 Hongwei Liu , Zilong Zheng , Yuxuan Qiao , Haodong Duan , Zhiwei Fei , Fengzhe Zhou , Wenwei Zhang , Songyang Zhang , Dahua Lin , Kai Chen

Can the rapid advances in code generation, function calling, and data analysis using large language models (LLMs) help automate the search and verification of hypotheses purely from a set of provided datasets? To evaluate this question, we…

Recent advances in large language models (LLMs) and medical LLMs (Med-LLMs) have demonstrated strong performance on general medical benchmarks. However, their capabilities in specialized medical fields, such as dentistry which require…

Computation and Language · Computer Science 2025-08-29 Hengchuan Zhu , Yihuan Xu , Yichen Li , Zijie Meng , Zuozhu Liu

Recently, there has been a growing interest among large language model (LLM) developers in LLM-based document reading systems, which enable users to upload their own documents and pose questions related to the document contents, going…

Computation and Language · Computer Science 2024-07-16 Anni Zou , Wenhao Yu , Hongming Zhang , Kaixin Ma , Deng Cai , Zhuosheng Zhang , Hai Zhao , Dong Yu

With the proliferation of Large Language Models (LLMs) in diverse domains, there is a particular need for unified evaluation standards in clinical medical scenarios, where models need to be examined very thoroughly. We present CliMedBench,…

Computation and Language · Computer Science 2024-10-07 Zetian Ouyang , Yishuai Qiu , Linlin Wang , Gerard de Melo , Ya Zhang , Yanfeng Wang , Liang He

Benchmarks establish a standardized evaluation framework to systematically assess the performance of large language models (LLMs), facilitating objective comparisons and driving advancements in the field. However, existing benchmarks fail…

Computation and Language · Computer Science 2026-02-16 Ziqian Zhang , Xingjian Hu , Yue Huang , Kai Zhang , Ruoxi Chen , Yixin Liu , Qingsong Wen , Kaidi Xu , Xiangliang Zhang , Neil Zhenqiang Gong , Lichao Sun

In the previous article, we presented a quantum-inspired framework for modeling semantic representation and processing in Large Language Models (LLMs), drawing upon mathematical tools and conceptual analogies from quantum mechanics to offer…

Artificial Intelligence · Computer Science 2025-05-26 Timo Aukusti Laine
‹ Prev 1 2 3 10 Next ›