Related papers: Error-Driven Prompt Optimization for Arithmetic Re…

Auto-Prompting with Retrieval Guidance for Frame Detection in Logistics

Prompt engineering plays a critical role in adapting large language models (LLMs) to complex reasoning and labeling tasks without the need for extensive fine-tuning. In this paper, we propose a novel prompt optimization pipeline for frame…

Computation and Language · Computer Science 2025-12-23 Do Minh Duc , Quan Xuan Truong , Nguyen Tat Dat , Nguyen Van Vinh

Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction

The rapid advancement of Large Language Models (LLMs) in the realm of mathematical reasoning necessitates comprehensive evaluations to gauge progress and inspire future directions. Existing assessments predominantly focus on problem-solving…

Computation and Language · Computer Science 2024-06-05 Xiaoyuan Li , Wenjie Wang , Moxin Li , Junrong Guo , Yang Zhang , Fuli Feng

Intent-based Prompt Calibration: Enhancing prompt optimization with synthetic boundary cases

Prompt engineering is a challenging and important task due to the high sensitivity of Large Language Models (LLMs) to the given prompt and the inherent ambiguity of a textual task instruction. Automatic prompt engineering is essential to…

Computation and Language · Computer Science 2024-02-06 Elad Levi , Eli Brosh , Matan Friedmann

Small Language Model Helps Resolve Semantic Ambiguity of LLM Prompt

Large language models (LLMs) are increasingly utilized in various complex reasoning tasks due to their excellent instruction following capability. However, the model's performance is highly dependent on the open-ended characteristics of the…

Computation and Language · Computer Science 2026-04-28 Zhenzhen Huang , Chaoning Zhang , Fachrina Dewi Puspitasari , Jiaquan Zhang , Yitian Zhou , Shuxu Chen , Yang Yang

PromptOptMe: Error-Aware Prompt Compression for LLM-based MT Evaluation Metrics

Evaluating the quality of machine-generated natural language content is a challenging task in Natural Language Processing (NLP). Recently, large language models (LLMs) like GPT-4 have been employed for this purpose, but they are…

Computation and Language · Computer Science 2024-12-23 Daniil Larionov , Steffen Eger

Grammar-Guided Evolutionary Search for Discrete Prompt Optimisation

Prompt engineering has proven to be a crucial step in leveraging pretrained large language models (LLMs) in solving various real-world tasks. Numerous solutions have been proposed that seek to automate prompt engineering by using the model…

Computation and Language · Computer Science 2025-07-15 Muzhaffar Hazman , Minh-Khoi Pham , Shweta Soundararajan , Goncalo Mordido , Leonardo Custode , David Lynch , Giorgio Cruciata , Yucheng Shi , Hongmeng Song , Wang Chao , Pan Yue , Aleksandar Milenovic , Alexandros Agapitos

LatentPrompt: Optimizing Promts in Latent Space

Recent advances have shown that optimizing prompts for Large Language Models (LLMs) can significantly improve task performance, yet many optimization techniques rely on heuristics or manual exploration. We present LatentPrompt, a…

Computation and Language · Computer Science 2025-08-05 Mateusz Bystroński , Grzegorz Piotrowski , Nitesh V. Chawla , Tomasz Kajdanowicz

Evaluating Prompt Engineering Techniques for RAG in Small Language Models: A Multi-Hop QA Approach

Retrieval Augmented Generation (RAG) is a powerful approach for enhancing the factual grounding of language models by integrating external knowledge. While widely studied for large language models, the optimization of RAG for Small Language…

Computation and Language · Computer Science 2026-02-17 Amir Hossein Mohammadi , Ali Moeinian , Zahra Razavizade , Afsaneh Fatemi , Reza Ramezani

MathPrompter: Mathematical Reasoning using Large Language Models

Large Language Models (LLMs) have limited performance when solving arithmetic reasoning tasks and often provide incorrect answers. Unlike natural language understanding, math problems typically have a single correct answer, making the task…

Computation and Language · Computer Science 2023-03-10 Shima Imani , Liang Du , Harsh Shrivastava

The Art of Asking: Multilingual Prompt Optimization for Synthetic Data

Synthetic data has become a cornerstone for scaling large language models, yet its multilingual use remains bottlenecked by translation-based prompts. This strategy inherits English-centric framing and style and neglects cultural…

Computation and Language · Computer Science 2025-10-23 David Mora , Viraat Aryabumi , Wei-Yin Ko , Sara Hooker , Julia Kreutzer , Marzieh Fadaee

Mathematical Computation and Reasoning Errors by Large Language Models

Large Language Models (LLMs) are increasingly utilized in AI-driven educational instruction and assessment, particularly within mathematics education. The capability of LLMs to generate accurate answers and detailed solutions for math…

Artificial Intelligence · Computer Science 2025-08-15 Liang Zhang , Edith Aurora Graf

A Survey on Mathematical Reasoning and Optimization with Large Language Models

Mathematical reasoning and optimization are fundamental to artificial intelligence and computational problem-solving. Recent advancements in Large Language Models (LLMs) have significantly improved AI-driven mathematical reasoning, theorem…

Artificial Intelligence · Computer Science 2025-03-25 Ali Forootani

A Survey of Automatic Prompt Optimization with Instruction-focused Heuristic-based Search Algorithm

Recent advances in Large Language Models have led to remarkable achievements across a variety of Natural Language Processing tasks, making prompt engineering increasingly central to guiding model outputs. While manual methods can be…

Computation and Language · Computer Science 2025-07-15 Wendi Cui , Zhuohang Li , Hao Sun , Damien Lopez , Kamalika Das , Bradley A. Malin , Sricharan Kumar , Jiaxin Zhang

eARCO: Efficient Automated Root Cause Analysis with Prompt Optimization

Root cause analysis (RCA) for incidents in large-scale cloud systems is a complex, knowledge-intensive task that often requires significant manual effort from on-call engineers (OCEs). Improving RCA is vital for accelerating the incident…

Software Engineering · Computer Science 2025-04-17 Drishti Goel , Raghav Magazine , Supriyo Ghosh , Akshay Nambi , Prathamesh Deshpande , Xuchao Zhang , Chetan Bansal , Saravan Rajmohan

Prompt Space Optimizing Few-shot Reasoning Success with Large Language Models

Prompt engineering is an essential technique for enhancing the abilities of large language models (LLMs) by providing explicit and specific instructions. It enables LLMs to excel in various tasks, such as arithmetic reasoning, question…

Computation and Language · Computer Science 2024-03-29 Fobo Shi , Peijun Qing , Dong Yang , Nan Wang , Youbo Lei , Haonan Lu , Xiaodong Lin , Duantengchuan Li

AutoPDL: Automatic Prompt Optimization for LLM Agents

The performance of large language models (LLMs) depends on how they are prompted, with choices spanning both the high-level prompting pattern (e.g., Zero-Shot, CoT, ReAct, ReWOO) and the specific prompt content (instructions and few-shot…

Machine Learning · Computer Science 2025-11-05 Claudio Spiess , Mandana Vaziri , Louis Mandel , Martin Hirzel

Probing for Arithmetic Errors in Language Models

We investigate whether internal activations in language models can be used to detect arithmetic errors. Starting with a controlled setting of 3-digit addition, we show that simple probes can accurately decode both the model's predicted…

Computation and Language · Computer Science 2025-07-17 Yucheng Sun , Alessandro Stolfo , Mrinmaya Sachan

From Natural Language to Solver-Ready Power System Optimization: An LLM-Assisted, Validation-in-the-Loop Framework

This paper introduces a novel Large Language Models (LLMs)-assisted agent that automatically converts natural-language descriptions of power system optimization scenarios into compact, solver-ready formulations and generates corresponding…

Artificial Intelligence · Computer Science 2025-08-12 Yunkai Hu , Tianqiao Zhao , Meng Yue

When "Better" Prompts Hurt: Evaluation-Driven Iteration for LLM Applications

Evaluating Large Language Model (LLM) applications differs from traditional software testing because outputs are stochastic, high-dimensional, and sensitive to prompt and model changes. We present an evaluation-driven workflow - Define,…

Computation and Language · Computer Science 2026-01-30 Daniel Commey

Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models

Generative large language models (LLMs), e.g., ChatGPT, have demonstrated remarkable proficiency across several NLP tasks, such as machine translation, text summarization. Recent research (Kocmi and Federmann, 2023) has shown that utilizing…

Computation and Language · Computer Science 2024-06-06 Qingyu Lu , Baopu Qiu , Liang Ding , Kanjian Zhang , Tom Kocmi , Dacheng Tao