Related papers: Ethos: Rectifying Language Models in Orthogonal Pa…

MBIAS: Mitigating Bias in Large Language Models While Retaining Context

The deployment of Large Language Models (LLMs) in diverse applications necessitates an assurance of safety without compromising the contextual integrity of the generated content. Traditional approaches, including safety-specific fine-tuning…

Computation and Language · Computer Science 2024-07-01 Shaina Raza , Ananya Raval , Veronica Chatrath

Enhancing Linguistic Competence of Language Models through Pre-training with Language Learning Tasks

Language models (LMs) are pre-trained on raw text datasets to generate text sequences token-by-token. While this approach facilitates the learning of world knowledge and reasoning, it does not explicitly optimize for linguistic competence.…

Computation and Language · Computer Science 2026-04-17 Atsuki Yamaguchi , Maggie Mi , Nikolaos Aletras

Improving Fairness in LLMs Through Testing-Time Adversaries

Large Language Models (LLMs) push the bound-aries in natural language processing and generative AI, driving progress across various aspects of modern society. Unfortunately, the pervasive issue of bias in LLMs responses (i.e., predictions)…

Computation and Language · Computer Science 2025-05-20 Isabela Pereira Gregio , Ian Pons , Anna Helena Reali Costa , Artur Jordão

Realistic Evaluation of Toxicity in Large Language Models

Large language models (LLMs) have become integral to our professional workflows and daily lives. Nevertheless, these machine companions of ours have a critical flaw: the huge amount of data which endows them with vast and diverse knowledge,…

Computation and Language · Computer Science 2024-05-21 Tinh Son Luong , Thanh-Thien Le , Linh Ngo Van , Thien Huu Nguyen

Bias and Fairness in Large Language Models: A Survey

Rapid advancements of large language models (LLMs) have enabled the processing, understanding, and generation of human-like text, with increasing integration into systems that touch our social sphere. Despite this success, these models can…

Computation and Language · Computer Science 2024-07-16 Isabel O. Gallegos , Ryan A. Rossi , Joe Barrow , Md Mehrab Tanjim , Sungchul Kim , Franck Dernoncourt , Tong Yu , Ruiyi Zhang , Nesreen K. Ahmed

Bias Vector: Mitigating Biases in Language Models with Task Arithmetic Approach

The use of language models (LMs) has increased considerably in recent years, and the biases and stereotypes in training data that are reflected in the LM outputs are causing social problems. In this paper, inspired by the task arithmetic,…

Computation and Language · Computer Science 2024-12-17 Daiki Shirafuji , Makoto Takenaka , Shinya Taguchi

Language Models as a Knowledge Source for Cognitive Agents

Language models (LMs) are sentence-completion engines trained on massive corpora. LMs have emerged as a significant breakthrough in natural-language processing, providing capabilities that go far beyond sentence completion including…

Artificial Intelligence · Computer Science 2021-10-26 Robert E. Wray , III , James R. Kirk , John E. Laird

Towards Understanding and Mitigating Social Biases in Language Models

As machine learning methods are deployed in real-world settings such as healthcare, legal systems, and social science, it is crucial to recognize how they shape social biases and stereotypes in these sensitive decision-making processes.…

Computation and Language · Computer Science 2021-06-25 Paul Pu Liang , Chiyu Wu , Louis-Philippe Morency , Ruslan Salakhutdinov

Large Pre-trained Language Models Contain Human-like Biases of What is Right and Wrong to Do

Artificial writing is permeating our lives due to recent advances in large-scale, transformer-based language models (LMs) such as BERT, its variants, GPT-2/3, and others. Using them as pre-trained models and fine-tuning them for specific…

Computation and Language · Computer Science 2022-02-15 Patrick Schramowski , Cigdem Turan , Nico Andersen , Constantin A. Rothkopf , Kristian Kersting

Erasing Conceptual Knowledge from Language Models

In this work, we introduce Erasure of Language Memory (ELM), a principled approach to concept-level unlearning that operates by matching distributions defined by the model's own introspective classification capabilities. Our key insight is…

Computation and Language · Computer Science 2025-07-23 Rohit Gandikota , Sheridan Feucht , Samuel Marks , David Bau

Mind the Inconspicuous: Revealing the Hidden Weakness in Aligned LLMs' Refusal Boundaries

Recent advances in Large Language Models (LLMs) have led to impressive alignment where models learn to distinguish harmful from harmless queries through supervised finetuning (SFT) and reinforcement learning from human feedback (RLHF). In…

Artificial Intelligence · Computer Science 2025-06-18 Jiahao Yu , Haozheng Luo , Jerry Yao-Chieh Hu , Wenbo Guo , Han Liu , Xinyu Xing

Making Bias Non-Predictive: Training Robust LLM Reasoning via Reinforcement Learning

Large language models (LLMs) increasingly serve as reasoners and automated evaluators, yet they remain susceptible to cognitive biases -- often altering their reasoning when faced with spurious prompt-level cues such as consensus claims or…

Computers and Society · Computer Science 2026-04-07 Qian Wang , Xuandong Zhao , Zirui Zhang , Zhanzhi Lou , Nuo Chen , Dawn Song , Bingsheng He

Forget What Matters, Keep the Rest: Selective Unlearning of Informative Tokens

Unlearning in large language models (LLMs) has emerged as a promising safeguard against adversarial behaviors. When the forgetting loss is applied uniformly without considering token-level semantic importance, model utility can be…

Computation and Language · Computer Science 2026-04-21 Seunghee Koh , Sunghyun Baek , Youngdong Kim , Junmo Kim

Promoting Equality in Large Language Models: Identifying and Mitigating the Implicit Bias based on Bayesian Theory

Large language models (LLMs) are trained on extensive text corpora, which inevitably include biased information. Although techniques such as Affective Alignment can mitigate some negative impacts of these biases, existing prompt-based…

Computation and Language · Computer Science 2024-08-21 Yongxin Deng , Xihe Qiu , Xiaoyu Tan , Jing Pan , Chen Jue , Zhijun Fang , Yinghui Xu , Wei Chu , Yuan Qi

ECLM: Entity Level Language Model for Spoken Language Understanding with Chain of Intent

Large Language Models (LLMs) have demonstrated impressive capabilities in language generation and general task performance. However, their application to spoken language understanding (SLU) remains challenging, particularly for token-level…

Computation and Language · Computer Science 2025-10-09 Shangjian Yin , Peijie Huang , Jiatian Chen , Haojing Huang , Yuhong Xu

Has this Fact been Edited? Detecting Knowledge Edits in Language Models

Knowledge editing methods (KEs) can update language models' obsolete or inaccurate knowledge learned from pre-training. However, KEs can be used for malicious applications, e.g., inserting misinformation and toxic content. Knowing whether a…

Computation and Language · Computer Science 2025-02-11 Paul Youssef , Zhixue Zhao , Christin Seifert , Jörg Schlötterer

Do Language Models Track Entities Across State Changes?

Entity tracking (ET), the ability to keep track of states, is a fundamental skill that underlies complex reasoning. An increasing amount of work investigates how transformer language models (LMs) solve entity binding $\textit{without}$…

Computation and Language · Computer Science 2026-05-29 Zilu Tang , Qiao Zhao , Gabriel Franco , Derry Wijaya , Aaron Mueller , Sebastian Schuster , Najoung Kim

REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space

Language models (LMs) risk inadvertently memorizing and divulging sensitive or personally identifiable information (PII) seen in training data, causing privacy concerns. Current approaches to address this issue involve costly dataset…

Computation and Language · Computer Science 2025-09-09 Tomer Ashuach , Martin Tutek , Yonatan Belinkov

Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication

Large Language Models (LLMs) have recently made significant strides in complex reasoning tasks through the Chain-of-Thought technique. Despite this progress, their reasoning is often constrained by their intrinsic understanding, lacking…

Computation and Language · Computer Science 2023-12-05 Zhangyue Yin , Qiushi Sun , Cheng Chang , Qipeng Guo , Junqi Dai , Xuanjing Huang , Xipeng Qiu

End-to-End Ontology Learning with Large Language Models

Ontologies are useful for automatic machine processing of domain knowledge as they represent it in a structured format. Yet, constructing ontologies requires substantial manual effort. To automate part of this process, large language models…

Machine Learning · Computer Science 2024-11-01 Andy Lo , Albert Q. Jiang , Wenda Li , Mateja Jamnik