Related papers: SHIELD: Thwarting Code Authorship Attribution

Misleading Authorship Attribution of Source Code using Adversarial Learning

In this paper, we present a novel attack against authorship attribution of source code. We exploit that recent attribution methods rest on machine learning and thus can be deceived by adversarial examples of source code. Our attack performs…

Machine Learning · Computer Science 2019-06-03 Erwin Quiring , Alwin Maier , Konrad Rieck

RoPGen: Towards Robust Code Authorship Attribution via Automatic Coding Style Transformation

Source code authorship attribution is an important problem often encountered in applications such as software forensics, bug fixing, and software quality analysis. Recent studies show that current source code authorship attribution methods…

Cryptography and Security · Computer Science 2022-02-15 Zhen Li , Guenevere , Chen , Chen Chen , Yayi Zou , Shouhuai Xu

SHIELD: Defending Textual Neural Networks against Multiple Black-Box Adversarial Attacks with Stochastic Multi-Expert Patcher

Even though several methods have proposed to defend textual neural network (NN) models against black-box adversarial attacks, they often defend against a specific text perturbation strategy and/or require re-training the models from…

Machine Learning · Computer Science 2022-03-17 Thai Le , Noseong Park , Dongwon Lee

Adversarial Binaries for Authorship Identification

Binary code authorship identification determines authors of a binary program. Existing techniques have used supervised machine learning for this task. In this paper, we look this problem from an attacker's perspective. We aim to modify a…

Cryptography and Security · Computer Science 2018-11-08 Xiaozhu Meng , Barton P. Miller , Somesh Jha

When Coding Style Survives Compilation: De-anonymizing Programmers from Executable Binaries

The ability to identify authors of computer programs based on their coding style is a direct threat to the privacy and anonymity of programmers. While recent work found that source code can be attributed to authors with high accuracy,…

Cryptography and Security · Computer Science 2017-12-19 Aylin Caliskan , Fabian Yamaguchi , Edwin Dauber , Richard Harang , Konrad Rieck , Rachel Greenstadt , Arvind Narayanan

SHIELD: Evaluation and Defense Strategies for Copyright Compliance in LLM Text Generation

Large Language Models (LLMs) have transformed machine learning but raised significant legal concerns due to their potential to produce text that infringes on copyrights, resulting in several high-profile lawsuits. The legal landscape is…

Computation and Language · Computer Science 2024-08-22 Xiaoze Liu , Ting Sun , Tianyang Xu , Feijie Wu , Cunxiang Wang , Xiaoqian Wang , Jing Gao

Robust and Accurate Authorship Attribution via Program Normalization

Source code attribution approaches have achieved remarkable accuracy thanks to the rapid advances in deep learning. However, recent studies shed light on their vulnerability to adversarial attacks. In particular, they can be easily deceived…

Machine Learning · Computer Science 2022-03-01 Yizhen Wang , Mohannad Alhanahnah , Ke Wang , Mihai Christodorescu , Somesh Jha

I still know it's you! On Challenges in Anonymizing Source Code

The source code of a program not only defines its semantics but also contains subtle clues that can identify its author. Several studies have shown that these clues can be automatically extracted using machine learning and allow for…

Cryptography and Security · Computer Science 2024-04-11 Micha Horlboge , Erwin Quiring , Roland Meyer , Konrad Rieck

Evaluate-and-Purify: Fortifying Code Language Models Against Adversarial Attacks Using LLM-as-a-Judge

The widespread adoption of code language models in software engineering tasks has exposed vulnerabilities to adversarial attacks, especially the identifier substitution attacks. Although existing identifier substitution attackers…

Software Engineering · Computer Science 2025-04-29 Wenhan Mu , Ling Xu , Shuren Pei , Le Mi , Huichi Zhou

A Girl Has A Name, And It's ... Adversarial Authorship Attribution for Deobfuscation

Recent advances in natural language processing have enabled powerful privacy-invasive authorship attribution. To counter authorship attribution, researchers have proposed a variety of rule-based and learning-based text obfuscation…

Computation and Language · Computer Science 2022-03-23 Wanyue Zhai , Jonathan Rusert , Zubair Shafiq , Padmini Srinivasan

Authorship Attribution of Source Code: A Language-Agnostic Approach and Applicability in Software Engineering

Authorship attribution (i.e., determining who is the author of a piece of source code) is an established research topic. State-of-the-art results for the authorship attribution problem look promising for the software engineering field,…

Software Engineering · Computer Science 2021-06-22 Egor Bogomolov , Vladimir Kovalenko , Yurii Rebryk , Alberto Bacchelli , Timofey Bryksin

I Can Find You in Seconds! Leveraging Large Language Models for Code Authorship Attribution

Source code authorship attribution is important in software forensics, plagiarism detection, and protecting software patch integrity. Existing techniques often rely on supervised machine learning, which struggles with generalization across…

Software Engineering · Computer Science 2025-01-15 Soohyeon Choi , Yong Kiam Tan , Mark Huasong Meng , Mohamed Ragab , Soumik Mondal , David Mohaisen , Khin Mi Mi Aung

Identifying Authorship Style in Malicious Binaries: Techniques, Challenges & Datasets

Attributing a piece of malware to its creator typically requires threat intelligence. Binary attribution increases the level of difficulty as it mostly relies upon the ability to disassemble binaries to identify authorship style. Our survey…

Cryptography and Security · Computer Science 2021-01-19 Jason Gray , Daniele Sgandurra , Lorenzo Cavallaro

A Survey on Adversarial Machine Learning for Code Data: Realistic Threats, Countermeasures, and Interpretations

Code Language Models (CLMs) have achieved tremendous progress in source code understanding and generation, leading to a significant increase in research interests focused on applying CLMs to real-world software engineering tasks in recent…

Cryptography and Security · Computer Science 2024-11-13 Yulong Yang , Haoran Fan , Chenhao Lin , Qian Li , Zhengyu Zhao , Chao Shen , Xiaohong Guan

Git Blame Who?: Stylistic Authorship Attribution of Small, Incomplete Source Code Fragments

Program authorship attribution has implications for the privacy of programmers who wish to contribute code anonymously. While previous work has shown that complete files that are individually authored can be attributed, we show here for the…

Machine Learning · Computer Science 2019-07-29 Edwin Dauber , Aylin Caliskan , Richard Harang , Gregory Shearer , Michael Weisman , Frederica Nelson , Rachel Greenstadt

Masks and Mimicry: Strategic Obfuscation and Impersonation Attacks on Authorship Verification

The increasing use of Artificial Intelligence (AI) technologies, such as Large Language Models (LLMs) has led to nontrivial improvements in various tasks, including accurate authorship identification of documents. However, while LLMs…

Computation and Language · Computer Science 2025-03-26 Kenneth Alperin , Rohan Leekha , Adaku Uchendu , Trang Nguyen , Srilakshmi Medarametla , Carlos Levya Capote , Seth Aycock , Charlie Dagli

Adversarial attacks on Copyright Detection Systems

It is well-known that many machine learning models are susceptible to adversarial attacks, in which an attacker evades a classifier by making small perturbations to inputs. This paper discusses how industrial copyright detection tools,…

Machine Learning · Computer Science 2019-06-21 Parsa Saadatpanah , Ali Shafahi , Tom Goldstein

A Girl Has A Name: Detecting Authorship Obfuscation

Authorship attribution aims to identify the author of a text based on the stylometric analysis. Authorship obfuscation, on the other hand, aims to protect against authorship attribution by modifying a text's style. In this paper, we…

Computation and Language · Computer Science 2020-05-05 Asad Mahmood , Zubair Shafiq , Padmini Srinivasan

Bridging Behavioral Biometrics and Source Code Stylometry: A Survey of Programmer Attribution

Programmer attribution seeks to identify or verify the author of a source code artifact using stylistic, structural, or behavioural characteristics. This problem has been studied across software engineering, security, and digital forensics,…

Software Engineering · Computer Science 2026-03-13 Marek Horvath , Emilia Pietrikova , Diomidis Spinellis

ML-LOO: Detecting Adversarial Examples with Feature Attribution

Deep neural networks obtain state-of-the-art performance on a series of tasks. However, they are easily fooled by adding a small adversarial perturbation to input. The perturbation is often human imperceptible on image data. We observe a…

Machine Learning · Computer Science 2019-06-11 Puyudi Yang , Jianbo Chen , Cho-Jui Hsieh , Jane-Ling Wang , Michael I. Jordan