Related papers: RobustDebias: Debiasing Language Models using Dist…

Gender-tuning: Empowering Fine-tuning for Debiasing Pre-trained Language Models

Recent studies have revealed that the widely-used Pre-trained Language Models (PLMs) propagate societal biases from the large unmoderated pre-training corpora. Existing solutions require debiasing training processes and datasets for…

Computation and Language · Computer Science 2023-07-25 Somayeh Ghanbarzadeh , Yan Huang , Hamid Palangi , Radames Cruz Moreno , Hamed Khanpour

Towards Resource Efficient and Interpretable Bias Mitigation in Large Language Models

Although large language models (LLMs) have demonstrated their effectiveness in a wide range of applications, they have also been observed to perpetuate unwanted biases present in the training data, potentially leading to harm for…

Computation and Language · Computer Science 2026-03-09 Schrasing Tong , Eliott Zemour , Jessica Lu , Rawisara Lohanimit , Lalana Kagal

Self-Debias: Self-correcting for Debiasing Large Language Models

Although Large Language Models (LLMs) demonstrate remarkable reasoning capabilities, inherent social biases often cascade throughout the Chain-of-Thought (CoT) process, leading to continuous "Bias Propagation". Existing debiasing methods…

Computation and Language · Computer Science 2026-05-12 Xuan Feng , Shuai Zhao , Luwei Xiao , Tianlong Gu , Bo An

Modular and On-demand Bias Mitigation with Attribute-Removal Subnetworks

Societal biases are reflected in large pre-trained language models and their fine-tuned versions on downstream tasks. Common in-processing bias mitigation approaches, such as adversarial training and mutual information removal, introduce…

Machine Learning · Computer Science 2023-06-06 Lukas Hauzenberger , Shahed Masoudian , Deepak Kumar , Markus Schedl , Navid Rekabsaz

Towards Understanding Task-agnostic Debiasing Through the Lenses of Intrinsic Bias and Forgetfulness

While task-agnostic debiasing provides notable generalizability and reduced reliance on downstream data, its impact on language modeling ability and the risk of relearning social biases from downstream task-specific data remain as the two…

Computation and Language · Computer Science 2024-06-07 Guangliang Liu , Milad Afshari , Xitong Zhang , Zhiyu Xue , Avrajit Ghosh , Bidhan Bashyal , Rongrong Wang , Kristen Johnson

Bi-directional Bias Attribution: Debiasing Large Language Models without Modifying Prompts

Large language models (LLMs) have demonstrated impressive capabilities across a wide range of natural language processing tasks. However, their outputs often exhibit social biases, raising fairness concerns. Existing debiasing methods, such…

Computation and Language · Computer Science 2026-02-05 Yujie Lin , Kunquan Li , Yixuan Liao , Xiaoxin Chen , Jinsong Su

Language Models Get a Gender Makeover: Mitigating Gender Bias with Few-Shot Data Interventions

Societal biases present in pre-trained large language models are a critical issue as these models have been shown to propagate biases in countless downstream applications, rendering them unfair towards specific groups of people. Since…

Computation and Language · Computer Science 2023-06-08 Himanshu Thakur , Atishay Jain , Praneetha Vaddamanu , Paul Pu Liang , Louis-Philippe Morency

DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization

Large language models (LLMs) deliver impressive results but face challenges from increasing model sizes and computational costs. Structured pruning reduces model size and speeds up inference but often causes uneven degradation across…

Computation and Language · Computer Science 2025-05-28 Hexuan Deng , Wenxiang Jiao , Xuebo Liu , Jing Li , Min Zhang , Zhaopeng Tu

MAFIA: Multi-Adapter Fused Inclusive LanguAge Models

Pretrained Language Models (PLMs) are widely used in NLP for various tasks. Recent studies have identified various biases that such models exhibit and have proposed methods to correct these biases. However, most of the works address a…

Computation and Language · Computer Science 2024-02-13 Prachi Jain , Ashutosh Sathe , Varun Gumma , Kabir Ahuja , Sunayana Sitaram

The Impact of Debiasing on the Performance of Language Models in Downstream Tasks is Underestimated

Pre-trained language models trained on large-scale data have learned serious levels of social biases. Consequently, various methods have been proposed to debias pre-trained models. Debiasing methods need to mitigate only discriminatory bias…

Computation and Language · Computer Science 2023-09-19 Masahiro Kaneko , Danushka Bollegala , Naoaki Okazaki

Investigating Bias in Multilingual Language Models: Cross-Lingual Transfer of Debiasing Techniques

This paper investigates the transferability of debiasing techniques across different languages within multilingual models. We examine the applicability of these techniques in English, French, German, and Dutch. Using multilingual BERT…

Computation and Language · Computer Science 2023-10-17 Manon Reusens , Philipp Borchert , Margot Mieskes , Jochen De Weerdt , Bart Baesens

Improving Bias Mitigation through Bias Experts in Natural Language Understanding

Biases in the dataset often enable the model to achieve high performance on in-distribution data, while poorly performing on out-of-distribution data. To mitigate the detrimental effect of the bias on the networks, previous works have…

Computation and Language · Computer Science 2023-12-07 Eojin Jeon , Mingyu Lee , Juhyeong Park , Yeachan Kim , Wing-Lam Mok , SangKeun Lee

Impact of Gender Debiased Word Embeddings in Language Modeling

Gender, race and social biases have recently been detected as evident examples of unfairness in applications of Natural Language Processing. A key path towards fairness is to understand, analyse and interpret our data and algorithms. Recent…

Computation and Language · Computer Science 2021-05-06 Christine Basta , Marta R. Costa-jussà

FineDeb: A Debiasing Framework for Language Models

As language models are increasingly included in human-facing machine learning tools, bias against demographic subgroups has gained attention. We propose FineDeb, a two-phase debiasing framework for language models that starts with…

Computation and Language · Computer Science 2023-02-07 Akash Saravanan , Dhruv Mullick , Habibur Rahman , Nidhi Hegde

FairReason: Balancing Reasoning and Social Bias in MLLMs

Multimodal Large Language Models (MLLMs) already achieve state-of-the-art results across a wide range of tasks and modalities. To push their reasoning ability further, recent studies explore advanced prompting schemes and post-training…

Artificial Intelligence · Computer Science 2025-09-09 Zhenyu Pan , Yutong Zhang , Jianshu Zhang , Haoran Lu , Haozheng Luo , Yuwei Han , Philip S. Yu , Manling Li , Han Liu

Distributionally Robust Language Modeling

Language models are generally trained on data spanning a wide range of topics (e.g., news, reviews, fiction), but they might be applied to an a priori unknown target distribution (e.g., restaurant reviews). In this paper, we first show that…

Computation and Language · Computer Science 2019-09-06 Yonatan Oren , Shiori Sagawa , Tatsunori B. Hashimoto , Percy Liang

An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models

Recent work has shown pre-trained language models capture social biases from the large amounts of text they are trained on. This has attracted attention to developing techniques that mitigate such biases. In this work, we perform an…

Computation and Language · Computer Science 2022-04-05 Nicholas Meade , Elinor Poole-Dayan , Siva Reddy

REFINE-LM: Mitigating Language Model Stereotypes via Reinforcement Learning

With the introduction of (large) language models, there has been significant concern about the unintended bias such models may inherit from their training data. A number of studies have shown that such models propagate gender stereotypes,…

Computation and Language · Computer Science 2024-08-20 Rameez Qureshi , Naïm Es-Sebbani , Luis Galárraga , Yvette Graham , Miguel Couceiro , Zied Bouraoui

Unlabeled Debiasing in Downstream Tasks via Class-wise Low Variance Regularization

Language models frequently inherit societal biases from their training data. Numerous techniques have been proposed to mitigate these biases during both the pre-training and fine-tuning stages. However, fine-tuning a pre-trained debiased…

Computation and Language · Computer Science 2024-10-03 Shahed Masoudian , Markus Frohmann , Navid Rekabsaz , Markus Schedl

Potential and Challenges of Model Editing for Social Debiasing

Large language models (LLMs) trained on vast corpora suffer from inevitable stereotype biases. Mitigating these biases with fine-tuning could be both costly and data-hungry. Model editing methods, which focus on modifying LLMs in a post-hoc…

Computation and Language · Computer Science 2024-02-22 Jianhao Yan , Futing Wang , Yafu Li , Yue Zhang