Related papers: ChallengeMe: An Adversarial Learning-enabled Text …

AdvSumm: Adversarial Training for Bias Mitigation in Text Summarization

Large Language Models (LLMs) have achieved impressive performance in text summarization and are increasingly deployed in real-world applications. However, these systems often inherit associative and framing biases from pre-training data,…

Computation and Language · Computer Science 2025-09-23 Mukur Gupta , Nikhil Reddy Varimalla , Nicholas Deas , Melanie Subbiah , Kathleen McKeown

Adversarial Math Word Problem Generation

Large language models (LLMs) have significantly transformed the educational landscape. As current plagiarism detection tools struggle to keep pace with LLMs' rapid advancements, the educational community faces the challenge of assessing…

Computation and Language · Computer Science 2024-06-18 Roy Xie , Chengxuan Huang , Junlin Wang , Bhuwan Dhingra

Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization

While large language models (LLMs) can already achieve strong performance on standard generic summarization benchmarks, their performance on more complex summarization task settings is less studied. Therefore, we benchmark LLMs on…

Computation and Language · Computer Science 2024-07-15 Yixin Liu , Alexander R. Fabbri , Jiawen Chen , Yilun Zhao , Simeng Han , Shafiq Joty , Pengfei Liu , Dragomir Radev , Chien-Sheng Wu , Arman Cohan

Large Language Models are Diverse Role-Players for Summarization Evaluation

Text summarization has a wide range of applications in many scenarios. The evaluation of the quality of the generated text is a complex problem. A big challenge to language evaluation is that there is a clear divergence between existing…

Computation and Language · Computer Science 2023-09-20 Ning Wu , Ming Gong , Linjun Shou , Shining Liang , Daxin Jiang

Towards Dataset-scale and Feature-oriented Evaluation of Text Summarization in Large Language Model Prompts

Recent advancements in Large Language Models (LLMs) and Prompt Engineering have made chatbot customization more accessible, significantly reducing barriers to tasks that previously required programming skills. However, prompt evaluation,…

Human-Computer Interaction · Computer Science 2025-08-13 Sam Yu-Te Lee , Aryaman Bahukhandi , Dongyu Liu , Kwan-Liu Ma

PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

The increasing reliance on Large Language Models (LLMs) across academia and industry necessitates a comprehensive understanding of their robustness to prompts. In response to this vital need, we introduce PromptRobust, a robustness…

Computation and Language · Computer Science 2024-07-17 Kaijie Zhu , Jindong Wang , Jiaheng Zhou , Zichen Wang , Hao Chen , Yidong Wang , Linyi Yang , Wei Ye , Yue Zhang , Neil Zhenqiang Gong , Xing Xie

Learning to Summarize by Learning to Quiz: Adversarial Agentic Collaboration for Long Document Summarization

Long document summarization remains a significant challenge for current large language models (LLMs), as existing approaches commonly struggle with information loss, factual inconsistencies, and coherence issues when processing excessively…

Computation and Language · Computer Science 2026-02-06 Weixuan Wang , Minghao Wu , Barry Haddow , Alexandra Birch

AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research

The rapid expansion of research on Large Language Model (LLM) safety and robustness has produced a fragmented and oftentimes buggy ecosystem of implementations, datasets, and evaluation methods. This fragmentation makes reproducibility and…

Artificial Intelligence · Computer Science 2025-11-07 Tim Beyer , Jonas Dornbusch , Jakob Steimle , Moritz Ladenburger , Leo Schwinn , Stephan Günnemann

Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models?

Large Language Models (LLMs) have demonstrated impressive capabilities in natural language tasks, but their safety and morality remain contentious due to their training on internet text corpora. To address these concerns, alignment…

Computation and Language · Computer Science 2024-08-06 Mohammad Bahrami Karkevandi , Nishant Vishwamitra , Peyman Najafirad

PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation

Large language models (LLMs) have revolutionized NLP research. Notably, in-context learning enables their use as evaluation metrics for natural language generation, making them particularly advantageous in low-resource scenarios and…

Computation and Language · Computer Science 2024-11-19 Christoph Leiter , Steffen Eger

LLM-ReSum: A Framework for LLM Reflective Summarization through Self-Evaluation

Reliable evaluation of large language model (LLM)-generated summaries remains an open challenge, particularly across heterogeneous domains and document lengths. We conduct a comprehensive meta-evaluation of 14 automatic summarization…

Computation and Language · Computer Science 2026-04-29 Huyen Nguyen , Haoxuan Zhang , Yang Zhang , Junhua Ding , Haihua Chen

Adversarial Tuning: Defending Against Jailbreak Attacks for LLMs

Although safely enhanced Large Language Models (LLMs) have achieved remarkable success in tackling various complex tasks in a zero-shot manner, they remain susceptible to jailbreak attacks, particularly the unknown jailbreak attack. To…

Computation and Language · Computer Science 2024-06-12 Fan Liu , Zhao Xu , Hao Liu

Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing

Recent advances in test-time scaling have shown promising results in improving Large Language Model (LLM) performance through strategic computation allocation during inference. While this approach has demonstrated strong improvements in…

Computation and Language · Computer Science 2025-05-21 Juntai Cao , Xiang Zhang , Raymond Li , Chuyuan Li , Chenyu You , Shafiq Joty , Giuseppe Carenini

Agentic Adversarial QA for Improving Domain-Specific LLMs

Large Language Models (LLMs), despite extensive pretraining on broad internet corpora, often struggle to adapt effectively to specialized domains. There is growing interest in fine-tuning these models for such domains; however, progress is…

Computation and Language · Computer Science 2026-02-23 Vincent Grari , Ciprian Tomoiaga , Sylvain Lamprier , Tatsunori Hashimoto , Marcin Detyniecki

Argument Summarization and its Evaluation in the Era of Large Language Models

Large Language Models (LLMs) have revolutionized various Natural Language Generation (NLG) tasks, including Argument Summarization (ArgSum), a key subfield of Argument Mining. This paper investigates the integration of state-of-the-art LLMs…

Computation and Language · Computer Science 2025-10-10 Moritz Altemeyer , Steffen Eger , Johannes Daxenberger , Yanran Chen , Tim Altendorf , Philipp Cimiano , Benjamin Schiller

Helping Large Language Models Protect Themselves: An Enhanced Filtering and Summarization System

The recent growth in the use of Large Language Models has made them vulnerable to sophisticated adversarial assaults, manipulative prompts, and encoded malicious inputs. Existing countermeasures frequently necessitate retraining models,…

Computation and Language · Computer Science 2026-03-10 Sheikh Samit Muhaimin , Spyridon Mastorakis

On Learning to Summarize with Large Language Models as References

Recent studies have found that summaries generated by large language models (LLMs) are favored by human annotators over the original reference summaries in commonly used summarization datasets. Therefore, we study an LLM-as-reference…

Computation and Language · Computer Science 2024-07-19 Yixin Liu , Kejian Shi , Katherine S He , Longtian Ye , Alexander R. Fabbri , Pengfei Liu , Dragomir Radev , Arman Cohan

TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale

The advent of large language models (LLMs) has significantly advanced natural language processing tasks like text summarization. However, their large size and computational demands, coupled with privacy concerns in data transmission, limit…

Computation and Language · Computer Science 2024-03-18 Pengcheng Jiang , Cao Xiao , Zifeng Wang , Parminder Bhatia , Jimeng Sun , Jiawei Han

SelfPrompt: Autonomously Evaluating LLM Robustness via Domain-Constrained Knowledge Guidelines and Refined Adversarial Prompts

Traditional methods for evaluating the robustness of large language models (LLMs) often rely on standardized benchmarks, which can escalate costs and limit evaluations across varied domains. This paper introduces a novel framework designed…

Computation and Language · Computer Science 2024-12-03 Aihua Pei , Zehua Yang , Shunan Zhu , Ruoxi Cheng , Ju Jia

Large Language Models can Contrastively Refine their Generation for Better Sentence Representation Learning

Recently, large language models (LLMs) have emerged as a groundbreaking technology and their unparalleled text generation capabilities have sparked interest in their application to the fundamental sentence representation learning task.…

Computation and Language · Computer Science 2024-05-20 Huiming Wang , Zhaodonghui Li , Liying Cheng , Soh De Wen , Lidong Bing