English
Related papers

Related papers: CursorCore: Assist Programming through Aligning An…

200 papers

Large language models are increasingly becoming a popular tool for software development. Their ability to model and generate source code has been demonstrated in a variety of contexts, including code completion, summarization, translation,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-15 Daniel Nichols , Joshua H. Davis , Zhaojun Xie , Arjun Rajaram , Abhinav Bhatele

Large language models exhibit complementary reasoning errors: on the same instance, one model may succeed with a particular decomposition while another fails. We propose Collaborative Reasoning (CORE), a training-time collaboration…

Artificial Intelligence · Computer Science 2026-01-30 Kshitij Mishra , Mirat Aubakirov , Martin Takac , Nils Lukas , Salem Lahlou

Programming assistants powered by large language models have improved dramatically, yet existing benchmarks still evaluate them in narrow code-generation settings. Recent efforts such as InfiBench and StackEval rely on Stack Overflow…

Software Engineering · Computer Science 2026-01-16 Myeongsoo Kim , Shweta Garg , Baishakhi Ray , Varun Kumar , Anoop Deoras

Instruction-tuned large language models have revolutionized natural language processing and have shown great potential in applications such as conversational agents. These models, such as GPT-4, can not only master language but also solve…

Computation and Language · Computer Science 2023-06-16 Yew Ken Chia , Pengfei Hong , Lidong Bing , Soujanya Poria

Code benchmarks such as HumanEval are widely adopted to evaluate Large Language Models' (LLMs) coding capabilities. However, there is an unignorable programming language bias in existing code benchmarks -- over 95% code generation…

Artificial Intelligence · Computer Science 2025-05-20 Ruiyang Xu , Jialun Cao , Yaojie Lu , Ming Wen , Hongyu Lin , Xianpei Han , Ben He , Shing-Chi Cheung , Le Sun

Large Language Models (LLMs) are predominantly assessed based on their common sense reasoning, language comprehension, and logical reasoning abilities. While models trained in specialized domains like mathematics or coding have demonstrated…

Software Engineering · Computer Science 2026-01-08 Danny Brahman , Mohammad Mahoor

As large language models become increasingly capable of generating code, evaluating their performance remains a complex and evolving challenge. Existing benchmarks primarily focus on functional correctness, overlooking the diversity of…

Software Engineering · Computer Science 2025-11-03 Forough Mehralian , Ryan Shar , James R. Rae , Alireza Hashemi

Code large language models mark a pivotal breakthrough in artificial intelligence. They are specifically crafted to understand and generate programming languages, significantly boosting the efficiency of coding development workflows. In…

Software Engineering · Computer Science 2024-03-26 Rui Xie , Zhengran Zeng , Zhuohao Yu , Chang Gao , Shikun Zhang , Wei Ye

Code generation models based on the pre-training and fine-tuning paradigm have been increasingly attempted by both academia and industry, resulting in well-known industrial models such as Codex, CodeGen, and PanGu-Coder. To evaluate the…

Software Engineering · Computer Science 2024-02-26 Hao Yu , Bo Shen , Dezhi Ran , Jiaxin Zhang , Qi Zhang , Yuchi Ma , Guangtai Liang , Ying Li , Qianxiang Wang , Tao Xie

Large Language Model (LLM) tools have demonstrated their potential to deliver high-quality assistance by providing instant, personalized feedback that is crucial for effective programming education. However, many of these tools operate…

Human-Computer Interaction · Computer Science 2025-04-08 Huiyong Li , Boxuan Ma

Program repair techniques offer cost-saving benefits for debugging within software development and programming education scenarios. With the proven effectiveness of Large Language Models (LLMs) in code-related tasks, researchers have…

Software Engineering · Computer Science 2024-07-09 Boyang Yang , Haoye Tian , Weiguo Pian , Haoran Yu , Haitao Wang , Jacques Klein , Tegawendé F. Bissyandé , Shunfu Jin

Benchmark datasets have a significant impact on accelerating research in programming language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation.…

Self-Correction aims to enable large language models (LLMs) to self-verify and self-refine their initial responses without external feedback. However, LLMs often fail to effectively self-verify and generate correct feedback, further…

Computation and Language · Computer Science 2025-05-28 Xiaoshuai Song , Yanan Wu , Weixun Wang , Jiaheng Liu , Wenbo Su , Bo Zheng

Code repair is a fundamental task in software development, facilitating efficient bug resolution and software maintenance. Although large language models (LLMs) have demonstrated considerable potential in automated code repair, their…

Software Engineering · Computer Science 2026-02-27 Dekun Dai , MingWei Liu , Anji Li , Jialun Cao , Yanlin Wang , Chong Wang , Xin Peng , Zibin Zheng

Programming is a powerful and ubiquitous problem-solving tool. Developing systems that can assist programmers or even generate programs independently could make programming more productive and accessible, yet so far incorporating…

Large language models (LLMs) have been widely adopted across diverse domains of software engineering, such as code generation, program repair, and vulnerability detection. These applications require understanding beyond surface-level code…

Software Engineering · Computer Science 2026-01-21 Danning Xie , Mingwei Zheng , Xuwei Liu , Jiannan Wang , Chengpeng Wang , Lin Tan , Xiangyu Zhang

Code Executing Reasoning is becoming a new non-functional metric that assesses the ability of large language models (LLMs) in programming tasks. State-of-the-art frameworks (CodeMind or REval) and benchmarks (CruxEval) usually focus on…

Software Engineering · Computer Science 2025-01-31 Changshu Liu , Reyhaneh Jabbarvand

Language models are now prevalent in software engineering with many developers using them to automate tasks and accelerate their development. While language models have been tremendous at accomplishing complex software engineering tasks,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-21 Daniel Nichols , Konstantinos Parasyris , Charles Jekel , Abhinav Bhatele , Harshitha Menon

Large language models make remarkable progress in reasoning capabilities. Existing works focus mainly on deductive reasoning tasks (e.g., code and math), while another type of reasoning mode that better aligns with human learning, inductive…

Computation and Language · Computer Science 2025-03-18 Kedi Chen , Zhikai Lei , Fan Zhang , Yinqi Zhang , Qin Chen , Jie Zhou , Liang He , Qipeng Guo , Kai Chen , Wei Zhang

Instruction-based multimodal image manipulation has recently made rapid progress. However, existing evaluation methods lack a systematic and human-aligned framework for assessing model performance on complex and creative editing tasks. To…

Computer Vision and Pattern Recognition · Computer Science 2026-03-30 Chonghuinan Wang , Zihan Chen , Yuxiang Wei , Tianyi Jiang , Xiaohe Wu , Fan Li , Wangmeng Zuo , Hongxun Yao
‹ Prev 1 2 3 10 Next ›