Related papers: Adversarial Math Word Problem Generation

Generating Valid and Natural Adversarial Examples with Large Language Models

Deep learning-based natural language processing (NLP) models, particularly pre-trained language models (PLMs), have been revealed to be vulnerable to adversarial attacks. However, the adversarial examples generated by many mainstream…

Computation and Language · Computer Science 2023-11-21 Zimu Wang , Wei Wang , Qi Chen , Qiufeng Wang , Anh Nguyen

How the Advent of Ubiquitous Large Language Models both Stymie and Turbocharge Dynamic Adversarial Question Generation

Dynamic adversarial question generation, where humans write examples to stump a model, aims to create examples that are realistic and informative. However, the advent of large language models (LLMs) has been a double-edged sword for human…

Computation and Language · Computer Science 2024-01-23 Yoo Yeon Sung , Ishani Mondal , Jordan Boyd-Graber

LLM-Generated Black-box Explanations Can Be Adversarially Helpful

Large Language Models (LLMs) are becoming vital tools that help us solve and understand complex problems by acting as digital assistants. LLMs can generate convincing explanations, even when only given the inputs and outputs of these…

Computation and Language · Computer Science 2024-10-14 Rohan Ajwani , Shashidhar Reddy Javaji , Frank Rudzicz , Zining Zhu

Solving Math Word Problems Using Estimation Verification and Equation Generation

Large Language Models (LLMs) excel at various tasks, including problem-solving and question-answering. However, LLMs often find Math Word Problems (MWPs) challenging because solving them requires a range of reasoning and mathematical…

Artificial Intelligence · Computer Science 2025-09-24 Mitchell Piehl , Dillon Wilson , Ananya Kalita , Jugal Kalita

Adversarial Attacks and Defenses in Large Language Models: Old and New Threats

Over the past decade, there has been extensive research aimed at enhancing the robustness of neural networks, yet this problem remains vastly unsolved. Here, one major impediment has been the overestimation of the robustness of new defense…

Artificial Intelligence · Computer Science 2023-10-31 Leo Schwinn , David Dobre , Stephan Günnemann , Gauthier Gidel

RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics

Existing benchmarks for evaluating mathematical reasoning in large language models (LLMs) rely primarily on competition problems, formal proofs, or artificially challenging questions -- failing to capture the nature of mathematics…

Artificial Intelligence · Computer Science 2025-10-21 Jie Zhang , Cezara Petrui , Kristina Nikolić , Florian Tramèr

Exploring the Adversarial Capabilities of Large Language Models

The proliferation of large language models (LLMs) has sparked widespread and general interest due to their strong language generation capabilities, offering great potential for both industry and research. While previous research delved into…

Artificial Intelligence · Computer Science 2024-07-09 Lukas Struppek , Minh Hieu Le , Dominik Hintersdorf , Kristian Kersting

Cutting Through the Noise: Boosting LLM Performance on Math Word Problems

Large Language Models (LLMs) excel at various tasks, including solving math word problems (MWPs), but struggle with real-world problems containing irrelevant information. To address this, we propose a prompting framework that generates…

Computation and Language · Computer Science 2025-09-17 Ujjwala Anantheswaran , Himanshu Gupta , Kevin Scaria , Shreyas Verma , Chitta Baral , Swaroop Mishra

Red Teaming Language Model Detectors with Language Models

The prevalence and strong capability of large language models (LLMs) present significant safety and ethical risks if exploited by malicious users. To prevent the potentially deceptive usage of LLMs, recent works have proposed algorithms to…

Computation and Language · Computer Science 2023-10-20 Zhouxing Shi , Yihan Wang , Fan Yin , Xiangning Chen , Kai-Wei Chang , Cho-Jui Hsieh

Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models?

Large Language Models (LLMs) have demonstrated impressive capabilities in natural language tasks, but their safety and morality remain contentious due to their training on internet text corpora. To address these concerns, alignment…

Computation and Language · Computer Science 2024-08-06 Mohammad Bahrami Karkevandi , Nishant Vishwamitra , Peyman Najafirad

Elementary Math Word Problem Generation using Large Language Models

Mathematics is often perceived as a complex subject by students, leading to high failure rates in exams. To improve Mathematics skills, it is important to provide sample questions for students to practice problem-solving. Manually creating…

Computation and Language · Computer Science 2026-03-27 Nimesh Ariyarathne , Harshani Bandara , Yasith Heshan , Omega Gamage , Surangika Ranathunga , Dilan Nayanajith , Yutharsan Sivapalan , Gayathri Lihinikaduarachchi , Tharoosha Vihidun , Meenambika Chandirakumar , Sanujen Premakumar , Sanjula Gathsara

Learning to Disprove: Formal Counterexample Generation with Large Language Models

Mathematical reasoning demands two critical, complementary skills: constructing rigorous proofs for true statements and discovering counterexamples that disprove false ones. However, current AI efforts in mathematics focus almost…

Artificial Intelligence · Computer Science 2026-03-23 Zenan Li , Zhaoyu Li , Kaiyu Yang , Xiaoxing Ma , Zhendong Su

Assessing Adversarial Robustness of Large Language Models: An Empirical Study

Large Language Models (LLMs) have revolutionized natural language processing, but their robustness against adversarial attacks remains a critical concern. We presents a novel white-box style attack approach that exposes vulnerabilities in…

Computation and Language · Computer Science 2024-09-16 Zeyu Yang , Zhao Meng , Xiaochen Zheng , Roger Wattenhofer

GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers

Large language models (LLMs) have achieved impressive performance across various mathematical reasoning benchmarks. However, there are increasing debates regarding whether these models truly understand and apply mathematical knowledge or…

Computation and Language · Computer Science 2024-07-03 Qintong Li , Leyang Cui , Xueliang Zhao , Lingpeng Kong , Wei Bi

ChallengeMe: An Adversarial Learning-enabled Text Summarization Framework

The astonishing performance of large language models (LLMs) and their remarkable achievements in production and daily life have led to their widespread application in collaborative tasks. However, current large models face challenges such…

Computation and Language · Computer Science 2025-02-10 Xiaoyu Deng , Ye Zhang , Tianmin Guo , Yongzhe Zhang , Zhengjian Kang , Hang Yang

Advancing NLP Security by Leveraging LLMs as Adversarial Engines

This position paper proposes a novel approach to advancing NLP security by leveraging Large Language Models (LLMs) as engines for generating diverse adversarial attacks. Building upon recent work demonstrating LLMs' effectiveness in…

Artificial Intelligence · Computer Science 2024-10-25 Sudarshan Srinivasan , Maria Mahbub , Amir Sadovnik

Creating Large Language Model Resistant Exams: Guidelines and Strategies

The proliferation of Large Language Models (LLMs), such as ChatGPT, has raised concerns about their potential impact on academic integrity, prompting the need for LLM-resistant exam designs. This article investigates the performance of LLMs…

Computation and Language · Computer Science 2023-04-25 Simon kaare Larsen

Breaking the Reviewer: Assessing the Vulnerability of Large Language Models in Automated Peer Review Under Textual Adversarial Attacks

Peer review is essential for maintaining academic quality, but the increasing volume of submissions places a significant burden on reviewers. Large language models (LLMs) offer potential assistance in this process, yet their susceptibility…

Computation and Language · Computer Science 2025-10-10 Tzu-Ling Lin , Wei-Chih Chen , Teng-Fang Hsiao , Hou-I Liu , Ya-Hsin Yeh , Yu Kai Chan , Wen-Sheng Lien , Po-Yen Kuo , Philip S. Yu , Hong-Han Shuai

Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models

Large Language Models (LLMs) have excelled at language understanding and generating human-level text. However, even with supervised training and human alignment, these LLMs are susceptible to adversarial attacks where malicious users can…

Computation and Language · Computer Science 2024-08-08 Shachi H Kumar , Saurav Sahay , Sahisnu Mazumder , Eda Okur , Ramesh Manuvinakurike , Nicole Beckage , Hsuan Su , Hung-yi Lee , Lama Nachman

Agentic Adversarial QA for Improving Domain-Specific LLMs

Large Language Models (LLMs), despite extensive pretraining on broad internet corpora, often struggle to adapt effectively to specialized domains. There is growing interest in fine-tuning these models for such domains; however, progress is…

Computation and Language · Computer Science 2026-02-23 Vincent Grari , Ciprian Tomoiaga , Sylvain Lamprier , Tatsunori Hashimoto , Marcin Detyniecki