Related papers: RITFIS: Robust input testing framework for LLMs-ba…

ABFS: Natural Robustness Testing for LLM-based NLP Software

Owing to the exceptional performance of Large Language Models (LLMs) in Natural Language Processing (NLP) tasks, LLM-based NLP software has rapidly gained traction across various domains, such as financial analysis and content moderation.…

Software Engineering · Computer Science 2025-03-04 Mingxuan Xiao , Yan Xiao , Shunhui Ji , Yunhe Li , Lei Xue , Pengcheng Zhang

TRUSTVIS: A Multi-Dimensional Trustworthiness Evaluation Framework for Large Language Models

As Large Language Models (LLMs) continue to revolutionize Natural Language Processing (NLP) applications, critical concerns about their trustworthiness persist, particularly in safety and robustness. To address these challenges, we…

Software Engineering · Computer Science 2025-10-16 Ruoyu Sun , Da Song , Jiayang Song , Yuheng Huang , Lei Ma

Revisit Input Perturbation Problems for LLMs: A Unified Robustness Evaluation Framework for Noisy Slot Filling Task

With the increasing capabilities of large language models (LLMs), these high-performance models have achieved state-of-the-art results on a wide range of natural language processing (NLP) tasks. However, the models' performance on…

Computation and Language · Computer Science 2023-10-11 Guanting Dong , Jinxu Zhao , Tingfeng Hui , Daichi Guo , Wenlong Wan , Boqi Feng , Yueyan Qiu , Zhuoma Gongque , Keqing He , Zechen Wang , Weiran Xu

BASFuzz: Towards Robustness Evaluation of LLM-based NLP Software via Automated Fuzz Testing

Fuzzing has shown great success in evaluating the robustness of intelligent natural language processing (NLP) software. As large language model (LLM)-based NLP software is widely deployed in critical industries, existing methods still face…

Software Engineering · Computer Science 2025-09-23 Mingxuan Xiao , Yan Xiao , Shunhui Ji , Jiahe Tu , Pengcheng Zhang

Evaluating Robustness of Large Language Models Against Multilingual Typographical Errors

Large language models (LLMs) are increasingly deployed in multilingual, real-world applications with user inputs -- naturally introducing \emph{typographical errors} (typos). Yet most benchmarks assume clean input, leaving the robustness of…

Computation and Language · Computer Science 2026-04-21 Raoyuan Zhao , Yihong Liu , Lena Altinger , Hinrich Schütze , Michael A. Hedderich

A Novel Evaluation Framework for Assessing Resilience Against Prompt Injection Attacks in Large Language Models

Prompt injection attacks exploit vulnerabilities in large language models (LLMs) to manipulate the model into unintended actions or generate malicious content. As LLM integrated applications gain wider adoption, they face growing…

Cryptography and Security · Computer Science 2024-01-03 Daniel Wankit Yip , Aysan Esmradi , Chun Fai Chan

Robustness in Large Language Models: A Survey of Mitigation Strategies and Evaluation Metrics

Large Language Models (LLMs) have emerged as a promising cornerstone for the development of natural language processing (NLP) and artificial intelligence (AI). However, ensuring the robustness of LLMs remains a critical challenge. To…

Computation and Language · Computer Science 2025-11-07 Pankaj Kumar , Subhankar Mishra

PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

The increasing reliance on Large Language Models (LLMs) across academia and industry necessitates a comprehensive understanding of their robustness to prompts. In response to this vital need, we introduce PromptRobust, a robustness…

Computation and Language · Computer Science 2024-07-17 Kaijie Zhu , Jindong Wang , Jiaheng Zhou , Zichen Wang , Hao Chen , Yidong Wang , Linyi Yang , Wei Ye , Yue Zhang , Neil Zhenqiang Gong , Xing Xie

Evaluation and Improvement of Fault Detection for Large Language Models

Large language models (LLMs) have recently achieved significant success across various application domains, garnering substantial attention from different communities. Unfortunately, even for the best LLM, many \textit{faults} still exist…

Software Engineering · Computer Science 2024-11-06 Qiang Hu , Jin Wen , Maxime Cordy , Yuheng Huang , Wei Ma , Xiaofei Xie , Lei Ma

PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs

Large Language Models (LLMs) have gained widespread use in various applications due to their powerful capability to generate human-like text. However, prompt injection attacks, which involve overwriting a model's original instructions with…

Cryptography and Security · Computer Science 2025-04-07 Jiahao Yu , Yangguang Shao , Hanwen Miao , Junzheng Shi

RoCoIns: Enhancing Robustness of Large Language Models through Code-Style Instructions

Large Language Models (LLMs) have showcased remarkable capabilities in following human instructions. However, recent studies have raised concerns about the robustness of LLMs when prompted with instructions combining textual adversarial…

Computation and Language · Computer Science 2024-02-27 Yuansen Zhang , Xiao Wang , Zhiheng Xi , Han Xia , Tao Gui , Qi Zhang , Xuanjing Huang

Certified Robustness for Large Language Models with Self-Denoising

Although large language models (LLMs) have achieved great success in vast real-world applications, their vulnerabilities towards noisy inputs have significantly limited their uses, especially in high-stake environments. In these contexts,…

Computation and Language · Computer Science 2023-07-17 Zhen Zhang , Guanhua Zhang , Bairu Hou , Wenqi Fan , Qing Li , Sijia Liu , Yang Zhang , Shiyu Chang

Evaluating and Improving Robustness in Large Language Models: A Survey and Future Directions

Large Language Models (LLMs) have gained enormous attention in recent years due to their capability of understanding and generating natural languages. With the rapid development and wild-range applications (e.g., Agents, Embodied…

Computation and Language · Computer Science 2025-07-10 Kun Zhang , Le Wu , Kui Yu , Guangyi Lv , Dacao Zhang

RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models

With the increasing use of large language models (LLMs), ensuring reliable performance in diverse, real-world environments is essential. Despite their remarkable achievements, LLMs often struggle with adversarial inputs, significantly…

Computation and Language · Computer Science 2024-06-18 Yuqing Wang , Yun Zhao

Prompt Injection as an Emerging Threat: Evaluating the Resilience of Large Language Models

Large Language Models (LLMs) are increasingly used in intelligent systems that perform reasoning, summarization, and code generation. Their ability to follow natural-language instructions, while powerful, also makes them vulnerable to a new…

Cryptography and Security · Computer Science 2025-11-13 Daniyal Ganiuly , Assel Smaiyl

Robustness of Prompting: Enhancing Robustness of Large Language Models Against Prompting Attacks

Large Language Models (LLMs) have demonstrated remarkable performance across various tasks by effectively utilizing a prompting strategy. However, they are highly sensitive to input perturbations, such as typographical errors or slight…

Computation and Language · Computer Science 2026-05-27 Lin Mu , Guowei Chu , Li Ni , Lei Sang , Yiwen Zhang

Improving the Robustness of Large Language Models for Code Tasks via Fine-tuning with Perturbed Data

Context: In the fast-paced evolution of software development, Large Language Models (LLMs) have become indispensable tools for tasks such as code generation, completion, analysis, and bug fixing. Ensuring the robustness of these models…

Software Engineering · Computer Science 2026-02-13 Yang Liu , Armstrong Foundjem , Xingfang Wu , Heng Li , Foutse Khomh

Enhancing the Capability and Robustness of Large Language Models through Reinforcement Learning-Driven Query Refinement

The capacity of large language models (LLMs) to generate honest, harmless, and helpful responses heavily relies on the quality of user prompts. However, these prompts often tend to be brief and vague, thereby significantly limiting the full…

Computation and Language · Computer Science 2025-07-01 Xiaohua Wang , Zisu Huang , Feiran Zhang , Zhibo Xu , Cenyuan Zhang , Qi Qian , Xiaoqing Zheng , Xuanjing Huang

RobuNFR: Evaluating the Robustness of Large Language Models on Non-Functional Requirements Aware Code Generation

When using LLMs to address Non-Functional Requirements (NFRs), developers may behave differently (e.g., expressing the same NFR in different words). Robust LLMs should output consistent results across these variations; however, this aspect…

Software Engineering · Computer Science 2025-04-04 Feng Lin , Dong Jae Kim , Zhenhao Li , Jinqiu Yang , Tse-Hsun , Chen

NLPerturbator: Studying the Robustness of Code LLMs to Natural Language Variations

Large language models (LLMs) achieve promising results in code generation based on a given natural language description. They have been integrated into open-source projects and commercial products to facilitate daily coding activities. The…

Software Engineering · Computer Science 2024-07-01 Junkai Chen , Zhenhao Li , Xing Hu , Xin Xia