Related papers: Textual Fingerprinting with Texts from Parkin, Bas…

Narrative Fingerprints: Multi-Scale Author Identification via Novelty Curve Dynamics

We test whether authors have characteristic "fingerprints" in the information-theoretic novelty curves of their published works. Working with two corpora -- Books3 (52,796 books, 759 qualifying authors) and PG-19 (28,439 books, 1,821…

Computation and Language · Computer Science 2026-04-02 Fred Zimmerman , Hilmar AI

Features Based Text Similarity Detection

As the Internet help us cross cultural border by providing different information, plagiarism issue is bound to arise. As a result, plagiarism detection becomes more demanding in overcoming this issue. Different plagiarism detection tools…

Computer Vision and Pattern Recognition · Computer Science 2010-03-25 Chow Kok Kent , Naomie Salim

Explainability of machine learning approaches in forensic linguistics: a case study in geolinguistic authorship profiling

Forensic authorship profiling uses linguistic markers to infer characteristics about an author of a text. This task is paralleled in dialect classification, where a prediction is made about the linguistic variety of a text based on the text…

Computation and Language · Computer Science 2024-07-02 Dana Roemling , Yves Scherrer , Aleksandra Miletic

TraSE: Towards Tackling Authorial Style from a Cognitive Science Perspective

Stylistic analysis of text is a key task in research areas ranging from authorship attribution to forensic analysis and personality profiling. The existing approaches for stylistic analysis are plagued by issues like topic influence, lack…

Computation and Language · Computer Science 2023-12-07 Ronald Wilson , Avanti Bhandarkar , Damon Woodard

Enhancing Representation Generalization in Authorship Identification

Authorship identification ascertains the authorship of texts whose origins remain undisclosed. That authorship identification techniques work as reliably as they do has been attributed to the fact that authorial style is properly captured…

Computation and Language · Computer Science 2023-10-03 Haining Wang

Profiling German Text Simplification with Interpretable Model-Fingerprints

While Large Language Models (LLMs) produce highly nuanced text simplifications, developers currently lack tools for a holistic, efficient, and reproducible diagnosis of their behavior. This paper introduces the Simplification Profiler, a…

Computation and Language · Computer Science 2026-01-21 Lars Klöser , Mika Beele , Bodo Kraft

Fingerprinting Fine-tuned Language Models in the Wild

There are concerns that the ability of language models (LMs) to generate high quality synthetic text can be misused to launch spam, disinformation, or propaganda. Therefore, the research community is actively working on developing…

Computation and Language · Computer Science 2021-06-04 Nirav Diwan , Tanmoy Chakravorty , Zubair Shafiq

Copyright Protection for Large Language Models: A Survey of Methods, Challenges, and Trends

Copyright protection for large language models is of critical importance, given their substantial development costs, proprietary value, and potential for misuse. Existing surveys have predominantly focused on techniques for tracing…

Cryptography and Security · Computer Science 2026-04-08 Zhenhua Xu , Xubin Yue , Zhebo Wang , Haobo Zhang , Qichen Liu , Xixiang Zhao , Jingxuan Zhang , Wenjun Zeng , Wengpeng Xing , Dezhang Kong , Changting Lin , Meng Han

ChatGPT-generated texts show authorship traits that identify them as non-human

Large Language Models can emulate different writing styles, ranging from composing poetry that appears indistinguishable from that of famous poets to using slang that can convince people that they are chatting with a human online. While…

Computation and Language · Computer Science 2025-08-25 Vittoria Dentella , Weihang Huang , Silvia Angela Mansi , Jack Grieve , Evelina Leivada

Document Author Classification Using Parsed Language Structure

Over the years there has been ongoing interest in detecting authorship of a text based on statistical properties of the text, such as by using occurrence rates of noncontextual words. In previous work, these techniques have been used, for…

Computation and Language · Computer Science 2024-03-21 Todd K Moon , Jacob H. Gunther

Detecting Stylistic Fingerprints of Large Language Models

Large language models (LLMs) have distinct and consistent stylistic fingerprints, even when prompted to write in different writing styles. Detecting these fingerprints is important for many reasons, among them protecting intellectual…

Computation and Language · Computer Science 2025-03-04 Yehonatan Bitton , Elad Bitton , Shai Nisan

Letter-level Online Writer Identification

Writer identification (writer-id), an important field in biometrics, aims to identify a writer by their handwriting. Identification in existing writer-id studies requires a complete document or text, limiting the scalability and flexibility…

Computer Vision and Pattern Recognition · Computer Science 2021-12-07 Zelin Chen , Hong-Xing Yu , Ancong Wu , Wei-Shi Zheng

Few-Shot Detection of Machine-Generated Text using Style Representations

The advent of instruction-tuned language models that convincingly mimic human writing poses a significant risk of abuse. However, such abuse may be counteracted with the ability to detect whether a piece of text was composed by a language…

Computation and Language · Computer Science 2024-05-09 Rafael Rivera Soto , Kailin Koch , Aleem Khan , Barry Chen , Marcus Bishop , Nicholas Andrews

A Survey of Relevant Text Mining Technology

Recent advances in text mining and natural language processing technology have enabled researchers to detect an authors identity or demographic characteristics, such as age and gender, in several text genres by automatically analysing the…

Cryptography and Security · Computer Science 2022-11-30 Claudia Peersman , Matthew Edwards , Emma Williams , Awais Rashid

Smells like Teen Spirit: An Exploration of Sensorial Style in Literary Genres

It is well recognized that sensory perceptions and language have interconnections through numerous studies in psychology, neuroscience, and sensorial linguistics. Set in this rich context we ask whether the use of sensorial language in…

Computation and Language · Computer Science 2022-09-27 Osama Khalid , Padmini Srinivasan

Speaker Fuzzy Fingerprints: Benchmarking Text-Based Identification in Multiparty Dialogues

Speaker identification using voice recordings leverages unique acoustic features, but this approach fails when only textual data is available. Few approaches have attempted to tackle the problem of identifying speakers solely from text, and…

Computation and Language · Computer Science 2025-04-22 Rui Ribeiro , Luísa Coheur , Joao P. Carvalho

Behavioral Fingerprinting of Large Language Models

Current benchmarks for Large Language Models (LLMs) primarily focus on performance metrics, often failing to capture the nuanced behavioral characteristics that differentiate them. This paper introduces a novel ``Behavioral Fingerprinting''…

Computation and Language · Computer Science 2025-09-08 Zehua Pei , Hui-Ling Zhen , Ying Zhang , Zhiyuan Yang , Xing Li , Xianzhi Yu , Mingxuan Yuan , Bei Yu

Authorship recognition via fluctuation analysis of network topology and word intermittency

Statistical methods have been widely employed in many practical natural language processing applications. More specifically, complex networks concepts and methods from dynamical systems theory have been successfully applied to recognize…

Computation and Language · Computer Science 2015-03-04 Diego R. Amancio

Towards Improved Model Design for Authorship Identification: A Survey on Writing Style Understanding

Authorship identification tasks, which rely heavily on linguistic styles, have always been an important part of Natural Language Understanding (NLU) research. While other tasks based on linguistic style understanding benefit from deep…

Computation and Language · Computer Science 2020-10-01 Weicheng Ma , Ruibo Liu , Lili Wang , Soroush Vosoughi

Social Media Writing Style Fingerprint

We present our approach for computer-aided social media text authorship attribution based on recent advances in short text authorship verification. We use various natural language techniques to create word-level and character-level models…

Computation and Language · Computer Science 2017-12-27 Himank Yadav , Juliang Li