English
Related papers

Related papers: PECC: Problem Extraction and Coding Challenges

200 papers

Large Language Models (LLMs) have exhibited remarkable capabilities across diverse domains, prompting investigations into their potential as generic reasoning engines. While recent studies have explored inference-time computation to enhance…

Artificial Intelligence · Computer Science 2025-02-18 Zi Wang , Shiwei Weng , Mohannad Alhanahnah , Somesh Jha , Tom Reps

Large Language Models (LLMs) have recently achieved impressive performance in math and reasoning benchmarks. However, they often struggle with logic problems and puzzles that are relatively easy for humans. To further investigate this, we…

Artificial Intelligence · Computer Science 2025-09-16 Nasim Borazjanizadeh , Roei Herzig , Trevor Darrell , Rogerio Feris , Leonid Karlinsky

With the significant progress of large reasoning models in complex coding and reasoning tasks, existing benchmarks, like LiveCodeBench and CodeElo, are insufficient to evaluate the coding capabilities of large language models (LLMs) in real…

Computation and Language · Computer Science 2025-06-06 Shiyi Xu , Yiwen Hu , Yingqian Min , Zhipeng Chen , Wayne Xin Zhao , Ji-Rong Wen

The rapid advancement of large language models has opened new avenues for automating complex problem-solving tasks such as algorithmic coding and competitive programming. This paper introduces a novel evaluation technique, LLM-ProS, to…

Computation and Language · Computer Science 2026-03-03 Md Sifat Hossain , Anika Tabassum , Md. Fahim Arefin , Tarannum Shaila Zaman

Competitive programming has emerged as a critical benchmark for evaluating the reasoning and coding capabilities of Large Language Models (LLMs). Despite impressive progress on existing benchmarks, we argue that current evaluations…

Recently, a number of repository-level code generation benchmarks-such as CoderEval, DevEval, RepoEval, RepoBench, and LongCodeArena-have emerged to evaluate the capabilities of large language models (LLMs) beyond standalone benchmarks like…

Software Engineering · Computer Science 2025-06-26 Shanchao Liang , Yiran Hu , Nan Jiang , Lin Tan

Existing benchmarks for evaluating mathematical reasoning in large language models (LLMs) rely primarily on competition problems, formal proofs, or artificially challenging questions -- failing to capture the nature of mathematics…

Artificial Intelligence · Computer Science 2025-10-21 Jie Zhang , Cezara Petrui , Kristina Nikolić , Florian Tramèr

The performance of large language models (LLMs) on existing reasoning benchmarks has significantly improved over the past years. In response, we present JEEBench, a considerably more challenging benchmark dataset for evaluating the problem…

Computation and Language · Computer Science 2023-10-24 Daman Arora , Himanshu Gaurav Singh , Mausam

Context: Due to the demand for strong algorithmic reasoning, complex logic implementation, and strict adherence to input/output formats and resource constraints, competitive programming generation by large language models (LLMs) is…

Social and Information Networks · Computer Science 2025-07-01 Minnan Wei , Ziming Li , Xiang Chen , Menglin Zheng , Ziyan Qu , Cheng Yu , Siyu Chen , Xiaolin Ju

Large language models (LLMs), as a novel information technology, are seeing increasing adoption in the Architecture, Engineering, and Construction (AEC) field. They have shown their potential to streamline processes throughout the building…

Computation and Language · Computer Science 2026-02-17 Chen Liang , Zhaoqi Huang , Haofen Wang , Fu Chai , Chunying Yu , Huanhuan Wei , Zhengjie Liu , Yanpeng Li , Hongjun Wang , Ruifeng Luo , Xianzhong Zhao

From pre-trained language model (PLM) to large language model (LLM), the field of natural language processing (NLP) has witnessed steep performance gains and wide practical uses. The evaluation of a research field guides its direction of…

Computation and Language · Computer Science 2023-08-16 Ziyu Zhuang , Qiguang Chen , Longxuan Ma , Mingda Li , Yi Han , Yushan Qian , Haopeng Bai , Zixian Feng , Weinan Zhang , Ting Liu

Large Language Models (LLMs) are increasingly integrated into software engineering workflows, yet current benchmarks provide only coarse performance summaries that obscure the diverse capabilities and limitations of these models. This paper…

Software Engineering · Computer Science 2026-01-21 Felix Mächtle , Jan-Niclas Serr , Nils Loose , Thomas Eisenbarth

While Large Language Models (LLMs) demonstrate impressive performance in mathematics, existing math benchmarks come with significant limitations. Many focus on problems with fixed ground-truth answers, and are often saturated due to problem…

Artificial Intelligence · Computer Science 2025-10-02 Mislav Balunović , Jasper Dekoninck , Nikola Jovanović , Ivo Petrov , Martin Vechev

Advancements in large language models (LLMs) are showing promising impact in software development and programming assistance. However, these models struggle when operating on low-level backend code. This challenge is exacerbated in the…

Software Engineering · Computer Science 2025-12-23 Muhammad Usman Tariq , Abhinav Jangda , Angelica Moreira , Madan Musuvathi , Tyler Sorensen

Large Language Models (LLMs) have shown remarkable success on a wide range of math and reasoning benchmarks. However, we observe that they often struggle when faced with unreasonable math problems. Instead of recognizing these issues,…

Computation and Language · Computer Science 2025-06-03 Jingyuan Ma , Damai Dai , Zihang Yuan , Rui li , Weilin Luo , Bin Wang , Qun Liu , Lei Sha , Zhifang Sui

We present a novel benchmark designed to rigorously evaluate the capabilities of large language models (LLMs) in mathematical reasoning and algorithmic code synthesis tasks. The benchmark comprises integer sequence generation tasks sourced…

Machine Learning · Computer Science 2025-11-11 Daniel O'Malley , Manish Bhattarai , Nishath Rajiv Ranasinghe , Erick Draayer , Javier Santos

Large Language Models (LLMs) have made significant strides in mathematical reasoning, underscoring the need for a comprehensive and fair evaluation of their capabilities. However, existing benchmarks often fall short, either lacking…

Computation and Language · Computer Science 2025-02-26 Xin Xu , Jiaxin Zhang , Tianhao Chen , Zitong Chao , Jishan Hu , Can Yang

As large language models (LLMs) become integral to code-related tasks, a central question emerges: Do LLMs truly understand program semantics? We introduce EquiBench, a new benchmark for evaluating LLMs through equivalence checking, i.e.,…

Machine Learning · Computer Science 2025-09-23 Anjiang Wei , Jiannan Cao , Ran Li , Hongyu Chen , Yuhui Zhang , Ziheng Wang , Yuan Liu , Thiago S. F. X. Teixeira , Diyi Yang , Ke Wang , Alex Aiken

Parallel programs in high performance computing (HPC) continue to grow in complexity and scale in the exascale era. The diversity in hardware and parallel programming models make developing, optimizing, and maintaining parallel software…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-15 Daniel Nichols , Aniruddha Marathe , Harshitha Menon , Todd Gamblin , Abhinav Bhatele

Large Language Models (LLMs) have emerged as coding assistants, capable of generating source code from natural language prompts. With the increasing adoption of LLMs in software development, academic research and industry based projects are…

‹ Prev 1 2 3 10 Next ›