Related papers: FineDeb: A Debiasing Framework for Language Models

Mitigating Biases in Language Models via Bias Unlearning

Many studies have shown various biases targeting different demographic groups in language models, amplifying discrimination and harming fairness. Recent parameter modification debiasing approaches significantly degrade core capabilities…

Computation and Language · Computer Science 2025-10-01 Dianqing Liu , Yi Liu , Guoqing Jin , Zhendong Mao

Impact of Gender Debiased Word Embeddings in Language Modeling

Gender, race and social biases have recently been detected as evident examples of unfairness in applications of Natural Language Processing. A key path towards fairness is to understand, analyse and interpret our data and algorithms. Recent…

Computation and Language · Computer Science 2021-05-06 Christine Basta , Marta R. Costa-jussà

From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings

Embeddings play a pivotal role in the efficacy of Large Language Models. They are the bedrock on which these models grasp contextual relationships and foster a more nuanced understanding of language and consequently perform remarkably on a…

Computation and Language · Computer Science 2025-01-08 Aishik Rakshit , Smriti Singh , Shuvam Keshari , Arijit Ghosh Chowdhury , Vinija Jain , Aman Chadha

RobustDebias: Debiasing Language Models using Distributionally Robust Optimization

Pretrained language models have been shown to exhibit biases and social stereotypes. Prior work on debiasing these models has largely focused on modifying embedding spaces during pretraining, which is not scalable for large models.…

Artificial Intelligence · Computer Science 2026-02-03 Deep Gandhi , Katyani Singh , Nidhi Hegde

Debiasing Pre-trained Contextualised Embeddings

In comparison to the numerous debiasing methods proposed for the static non-contextualised word embeddings, the discriminative biases in contextualised embeddings have received relatively little attention. We propose a fine-tuning method…

Computation and Language · Computer Science 2021-01-26 Masahiro Kaneko , Danushka Bollegala

General Phrase Debiaser: Debiasing Masked Language Models at a Multi-Token Level

The social biases and unwelcome stereotypes revealed by pretrained language models are becoming obstacles to their application. Compared to numerous debiasing methods targeting word level, there has been relatively less attention on biases…

Computation and Language · Computer Science 2024-01-26 Bingkang Shi , Xiaodan Zhang , Dehan Kong , Yulei Wu , Zongzhen Liu , Honglei Lyu , Longtao Huang

Collapsed Language Models Promote Fairness

To mitigate societal biases implicitly encoded in recent successful pretrained language models, a diverse array of approaches have been proposed to encourage model fairness, focusing on prompting, data augmentation, regularized fine-tuning,…

Computation and Language · Computer Science 2025-01-30 Jingxuan Xu , Wuyang Chen , Linyi Li , Yao Zhao , Yunchao Wei

Language Models Get a Gender Makeover: Mitigating Gender Bias with Few-Shot Data Interventions

Societal biases present in pre-trained large language models are a critical issue as these models have been shown to propagate biases in countless downstream applications, rendering them unfair towards specific groups of people. Since…

Computation and Language · Computer Science 2023-06-08 Himanshu Thakur , Atishay Jain , Praneetha Vaddamanu , Paul Pu Liang , Louis-Philippe Morency

Towards Resource Efficient and Interpretable Bias Mitigation in Large Language Models

Although large language models (LLMs) have demonstrated their effectiveness in a wide range of applications, they have also been observed to perpetuate unwanted biases present in the training data, potentially leading to harm for…

Computation and Language · Computer Science 2026-03-09 Schrasing Tong , Eliott Zemour , Jessica Lu , Rawisara Lohanimit , Lalana Kagal

BiasFilter: An Inference-Time Debiasing Framework for Large Language Models

Mitigating social bias in large language models (LLMs) has become an increasingly important research objective. However, existing debiasing methods often incur high human and computational costs, exhibit limited effectiveness, and struggle…

Computation and Language · Computer Science 2025-06-02 Xiaoqing Cheng , Ruizhe Chen , Hongying Zan , Yuxiang Jia , Min Peng

Modular and On-demand Bias Mitigation with Attribute-Removal Subnetworks

Societal biases are reflected in large pre-trained language models and their fine-tuned versions on downstream tasks. Common in-processing bias mitigation approaches, such as adversarial training and mutual information removal, introduce…

Machine Learning · Computer Science 2023-06-06 Lukas Hauzenberger , Shahed Masoudian , Deepak Kumar , Markus Schedl , Navid Rekabsaz

Debiasing Vision-Language Models via Biased Prompts

Machine learning models have been shown to inherit biases from their training datasets. This can be particularly problematic for vision-language foundation models trained on uncurated datasets scraped from the internet. The biases can be…

Machine Learning · Computer Science 2023-05-16 Ching-Yao Chuang , Varun Jampani , Yuanzhen Li , Antonio Torralba , Stefanie Jegelka

Gender-tuning: Empowering Fine-tuning for Debiasing Pre-trained Language Models

Recent studies have revealed that the widely-used Pre-trained Language Models (PLMs) propagate societal biases from the large unmoderated pre-training corpora. Existing solutions require debiasing training processes and datasets for…

Computation and Language · Computer Science 2023-07-25 Somayeh Ghanbarzadeh , Yan Huang , Hamid Palangi , Radames Cruz Moreno , Hamed Khanpour

AutoDebias: Automated Framework for Debiasing Text-to-Image Models

Text-to-Image (T2I) models generate high-quality images but are vulnerable to malicious backdoor attacks that inject harmful biases (e.g., trigger-activated gender or racial stereotypes). Existing debiasing methods, often designed for…

Computer Vision and Pattern Recognition · Computer Science 2026-03-02 Hongyi Cai , Mohammad Mahdinur Rahman , Mingkang Dong , Muxin Pu , Moqyad Alqaily , Jie Li , Xinfeng Li , Jialie Shen , Meikang Qiu , Qingsong Wen

PEFTDebias : Capturing debiasing information using PEFTs

The increasing use of foundation models highlights the urgent need to address and eliminate implicit biases present in them that arise during pretraining. In this paper, we introduce PEFTDebias, a novel approach that employs…

Machine Learning · Computer Science 2023-12-04 Sumit Agarwal , Aditya Srikanth Veerubhotla , Srijan Bansal

Improving Bias Mitigation through Bias Experts in Natural Language Understanding

Biases in the dataset often enable the model to achieve high performance on in-distribution data, while poorly performing on out-of-distribution data. To mitigate the detrimental effect of the bias on the networks, previous works have…

Computation and Language · Computer Science 2023-12-07 Eojin Jeon , Mingyu Lee , Juhyeong Park , Yeachan Kim , Wing-Lam Mok , SangKeun Lee

FairFlow: Mitigating Dataset Biases through Undecided Learning

Language models are prone to dataset biases, known as shortcuts and spurious correlations in data, which often result in performance drop on new data. We present a new debiasing framework called ``FairFlow'' that mitigates dataset biases by…

Machine Learning · Computer Science 2025-03-25 Jiali Cheng , Hadi Amiri

REFINE-LM: Mitigating Language Model Stereotypes via Reinforcement Learning

With the introduction of (large) language models, there has been significant concern about the unintended bias such models may inherit from their training data. A number of studies have shown that such models propagate gender stereotypes,…

Computation and Language · Computer Science 2024-08-20 Rameez Qureshi , Naïm Es-Sebbani , Luis Galárraga , Yvette Graham , Miguel Couceiro , Zied Bouraoui

Gender Biases and Where to Find Them: Exploring Gender Bias in Pre-Trained Transformer-based Language Models Using Movement Pruning

Language model debiasing has emerged as an important field of study in the NLP community. Numerous debiasing techniques were proposed, but bias ablation remains an unaddressed issue. We demonstrate a novel framework for inspecting bias in…

Computation and Language · Computer Science 2022-07-07 Przemyslaw Joniak , Akiko Aizawa

Unlabeled Debiasing in Downstream Tasks via Class-wise Low Variance Regularization

Language models frequently inherit societal biases from their training data. Numerous techniques have been proposed to mitigate these biases during both the pre-training and fine-tuning stages. However, fine-tuning a pre-trained debiased…

Computation and Language · Computer Science 2024-10-03 Shahed Masoudian , Markus Frohmann , Navid Rekabsaz , Markus Schedl