English
Related papers

Related papers: Benchmarking Causal Study to Interpret Large Langu…

200 papers

While code generation has been widely used in various software development scenarios, the quality of the generated code is not guaranteed. This has been a particular concern in the era of large language models (LLMs)- based code generation,…

Software Engineering · Computer Science 2023-10-11 Zhenlan Ji , Pingchuan Ma , Zongjie Li , Shuai Wang

Causal reasoning capabilities are essential for large language models (LLMs) in a wide range of applications, such as education and healthcare. But there is still a lack of benchmarks for a better understanding of such capabilities. Current…

Computation and Language · Computer Science 2024-12-25 Ruibo Tu , Hedvig Kjellström , Gustav Eje Henter , Cheng Zhang

Large Language Models (LLMs) are gaining popularity among software engineers. A crucial aspect of developing effective code generation LLMs is to evaluate these models using a robust benchmark. Evaluation benchmarks with quality issues can…

Software Engineering · Computer Science 2024-09-05 Mohammed Latif Siddiq , Simantika Dristi , Joy Saha , Joanna C. S. Santos

Numerous benchmarks aim to evaluate the capabilities of Large Language Models (LLMs) for causal inference and reasoning. However, many of them can likely be solved through the retrieval of domain knowledge, questioning whether they achieve…

Machine Learning · Computer Science 2024-07-12 Linying Yang , Vik Shirvaikar , Oscar Clivio , Fabian Falck

This study explores the capability of Large Language Models (LLMs) to evaluate causality in causal graphs generated by conventional statistical causal discovery methods-a task traditionally reliant on manual assessment by human subject…

Computation and Language · Computer Science 2025-04-16 Yuni Susanti , Nina Holsmoelle

Code Large Language Models (CLLMs) have exhibited outstanding performance in program synthesis, attracting the focus of the research community. The evaluation of CLLM's program synthesis capability has generally relied on manually curated…

Software Engineering · Computer Science 2025-05-13 Longtian Wang , Tianlin Li , Xiaofei Xie , Yuhan Zhi , Jian Wang , Chao Shen

Large Language Models (LLMs) have been gaining increasing attention and demonstrated promising performance across a variety of Software Engineering (SE) tasks, such as Automated Program Repair (APR), code summarization, and code completion.…

Software Engineering · Computer Science 2024-04-18 Quanjun Zhang , Tongke Zhang , Juan Zhai , Chunrong Fang , Bowen Yu , Weisong Sun , Zhenyu Chen

The causal capabilities of large language models (LLMs) are a matter of significant debate, with critical implications for the use of LLMs in societally impactful domains such as medicine, science, law, and policy. We conduct a "behavorial"…

Artificial Intelligence · Computer Science 2024-08-21 Emre Kıcıman , Robert Ness , Amit Sharma , Chenhao Tan

In the era of large language models (LLMs), code benchmarks have become an important research area in software engineering and are widely used by practitioners. These benchmarks evaluate the performance of LLMs on specific code-related…

Software Engineering · Computer Science 2025-06-24 Zhiyuan Pan , Xing Hu , Xin Xia , Xiaohu Yang

Reliable causal inference is essential for making decisions in high-stakes areas like medicine, economics, and public policy. However, it remains unclear whether large language models (LLMs) can handle rigorous and trustworthy statistical…

Artificial Intelligence · Computer Science 2026-05-13 Jin Du , Li Chen , Xun Xian , An Luo , Fangqiao Tian , Ganghua Wang , Charles Doss , Xiaotong Shen , Jie Ding

Unit testing is an essential activity in software development for verifying the correctness of software components. However, manually writing unit tests is challenging and time-consuming. The emergence of Large Language Models (LLMs) offers…

Software Engineering · Computer Science 2024-09-26 Lin Yang , Chen Yang , Shutao Gao , Weijing Wang , Bo Wang , Qihao Zhu , Xiao Chu , Jianyi Zhou , Guangtai Liang , Qianxiang Wang , Junjie Chen

Generative Large Language Models (LLMs) are increasingly used in non-generative software maintenance tasks, such as fault localization (FL). Success in FL depends on a models ability to reason about program semantics beyond surface-level…

Large Language Models (LLMs) have demonstrated remarkable success in various natural language processing and software engineering tasks, such as code generation. The LLMs are mainly utilized in the prompt-based zero/few-shot paradigm to…

Software Engineering · Computer Science 2024-01-31 Mohamad Khajezade , Jie JW Wu , Fatemeh Hendijani Fard , Gema Rodríguez-Pérez , Mohamed Sami Shehata

Large Language Models (LLMs) have demonstrated promise in medical knowledge assessments, yet their practical utility in real-world clinical decision-making remains underexplored. In this study, we evaluated the performance of three…

Computation and Language · Computer Science 2025-12-30 Mengdi Chai , Ali R. Zomorrodi

Program synthesis has been long studied with recent approaches focused on directly using the power of Large Language Models (LLMs) to generate code. Programming benchmarks, with curated synthesis problems and test-cases, are used to measure…

Software Engineering · Computer Science 2023-11-01 Jiawei Liu , Chunqiu Steven Xia , Yuyao Wang , Lingming Zhang

Large Language Models, particularly decoder-only generative models such as GPT, are increasingly used to automate Software Engineering tasks. These models are primarily guided through natural language prompts, making prompt engineering a…

Software Engineering · Computer Science 2026-01-06 Alexander Korn , Lea Zaruchas , Chetan Arora , Andreas Metzger , Sven Smolka , Fanyu Wang , Andreas Vogelsang

Large language models (LLMs) like ChatGPT are increasingly used in academic writing, yet issues such as incorrect or fabricated references raise ethical concerns. Moreover, current content quality evaluations often rely on subjective human…

Computation and Language · Computer Science 2025-09-15 Jing Ren , Weiqi Wang

Recent observations have underscored a disparity between the inflated benchmark scores and the actual performance of LLMs, raising concerns about potential contamination of evaluation benchmarks. This issue is especially critical for…

Computation and Language · Computer Science 2024-04-05 Chunyuan Deng , Yilun Zhao , Xiangru Tang , Mark Gerstein , Arman Cohan

Large language models (LLMs) have shown promise for automated source-code translation, a capability critical to software migration, maintenance, and interoperability. Yet comparative evidence on how model choice, prompt design, and prompt…

Software Engineering · Computer Science 2025-09-17 Aamer Aljagthami , Mohammed Banabila , Musab Alshehri , Mohammed Kabini , Mohammad D. Alahmadi

Large Language Models (LLMs) are increasingly applied to automate software engineering tasks, including the generation of UML class diagrams from natural language descriptions. While prior work demonstrates that LLMs can produce…

Software Engineering · Computer Science 2026-04-07 Rabia Iftikhar , Andreas Rausch
‹ Prev 1 2 3 10 Next ›