English
Related papers

Related papers: Same or Different? Diff-Vectors for Authorship Ana…

200 papers

A central problem that has been researched for many years in the field of digital text forensics is the question whether two documents were written by the same author. Authorship verification (AV) is a research branch in this field that…

Computation and Language · Computer Science 2020-07-09 Oren Halvani , Lukas Graner , Roey Regev

Recent work has demonstrated that vector offsets obtained by subtracting pretrained word embedding vectors can be used to predict lexical relations with surprising accuracy. Inspired by this finding, in this paper, we extend the idea to the…

Computation and Language · Computer Science 2019-07-19 Jingyuan Zhang , Timothy Baldwin

Text mining and information retrieval techniques have been developed to assist us with analyzing, organizing and retrieving documents with the help of computers. In many cases, it is desirable that the authors of such documents remain…

Cryptography and Security · Computer Science 2018-05-03 Benjamin Weggenmann , Florian Kerschbaum

We propose two models for a special case of authorship verification problem. The task is to investigate whether the two documents of a given pair are written by the same author. We consider the authorship verification problem for both small…

Computation and Language · Computer Science 2018-03-20 Marjan Hosseinia , Arjun Mukherjee

In this paper we perform a comparative analysis of three models for feature representation of text documents in the context of document classification. In particular, we consider the most often used family of models bag-of-words, recently…

Computation and Language · Computer Science 2017-07-06 Sanda Martinčić-Ipšić , Tanja Miličić , Ljupčo Todorovski

In this article we study the problem of document image representation based on visual features. We propose a comprehensive experimental study that compares three types of visual document image representations: (1) traditional so-called…

Computer Vision and Pattern Recognition · Computer Science 2016-12-05 Gabriela Csurka , Diane Larlus , Albert Gordo , Jon Almazan

In this paper, we investigate the problem of classifying feature vectors with mutually independent but non-identically distributed elements. First, we show the importance of this problem. Next, we propose a classifier and derive an…

Machine Learning · Computer Science 2021-09-01 Farzad Shahrivari , Nikola Zlatanov

Word feature vectors have been proven to improve many NLP tasks. With recent advances in unsupervised learning of these feature vectors, it became possible to train it with much more data, which also resulted in better quality of learned…

Computation and Language · Computer Science 2022-11-29 Marius Sajgalik , Michal Barla , Maria Bielikova

We present Gram2Vec, a grammatical style embedding system that embeds documents into a higher dimensional space by extracting the normalized relative frequencies of grammatical features present in the text. Compared to neural approaches,…

Computation and Language · Computer Science 2025-11-27 Peter Zeng , Hannah Stortz , Eric Sclafani , Alina Shabaeva , Maria Elizabeth Garza , Daniel Greeson , Owen Rambow

The task of deciding whether two documents are written by the same author is challenging for both machines and humans. This task is even more challenging when the two documents are written about different topics (e.g. baseball vs. politics)…

Computation and Language · Computer Science 2024-08-12 Steven Fincke , Elizabeth Boschee

Linguistic style is an integral component of language. Recent advances in the development of style representations have increasingly used training objectives from authorship verification (AV): Do two texts have the same author? The…

Computation and Language · Computer Science 2022-04-12 Anna Wegmann , Marijn Schraagen , Dong Nguyen

Identifying what is at the center of the meaning of a word and what discriminates it from other words is a fundamental natural language inference task. This paper describes an explicit word vector representation model (WVM) to support the…

Computation and Language · Computer Science 2019-09-13 Armins Stepanjans , André Freitas

We propose an unsupervised solution to the Authorship Verification task that utilizes pre-trained deep language models to compute a new metric called DV-Distance. The proposed metric is a measure of the difference between the two authors…

Computation and Language · Computer Science 2021-03-15 Yifan Zhang , Dainis Boumber , Marjan Hosseinia , Fan Yang , Arjun Mukherjee

A major computational burden, while performing document clustering, is the calculation of similarity measure between a pair of documents. Similarity measure is a function that assign a real number between 0 and 1 to a pair of documents,…

Information Retrieval · Computer Science 2012-08-20 Muhammad Rafi , Sundus Hassan , Mohammad Shahid Shaikh

The writing can be used as an important biometric modality which allows to unequivocally identify an individual. It happens because the writing of two different persons present differences that can be explored both in terms of graphometric…

Computer Vision and Pattern Recognition · Computer Science 2020-05-19 Fabio Pinhelli , Alceu S. Britto , Luiz S. Oliveira , Yandre M. G. Costa , Diego Bertolini

We adapt the Higher Criticism (HC) goodness-of-fit test to measure the closeness between word-frequency tables. We apply this measure to authorship attribution challenges, where the goal is to identify the author of a document using other…

Computation and Language · Computer Science 2023-10-03 Alon Kipnis

Authorship attribution mainly deals with undecided authorship of literary texts. Authorship attribution is useful in resolving issues like uncertain authorship, recognize authorship of unknown texts, spot plagiarism so on. Statistical…

Digital Libraries · Computer Science 2013-10-21 M. Sudheep Elayidom , Chinchu Jose , Anitta Puthussery , Neenu K Sasi

Handwritten document analysis is an area of forensic science, with the goal of establishing authorship of documents through examination of inherent characteristics. Law enforcement agencies use standard protocols based on manual processing…

Computer Vision and Pattern Recognition · Computer Science 2024-01-17 Eleonora Breci , Luca Guarnera , Sebastiano Battiato

Recognizing elementary underlying concepts from observations (disentanglement) and generating novel combinations of these concepts (compositional generalization) are fundamental abilities for humans to support rapid knowledge learning and…

Computer Vision and Pattern Recognition · Computer Science 2023-05-30 Tao Yang , Yuwang Wang , Cuiling Lan , Yan Lu , Nanning Zheng

Label differential privacy (DP) is a framework that protects the privacy of labels in training datasets, while the feature vectors are public. Existing approaches protect the privacy of labels by flipping them randomly, and then train a…

Machine Learning · Computer Science 2024-05-27 Puning Zhao , Rongfei Fan , Huiwen Wu , Qingming Li , Jiafei Wu , Zhe Liu
‹ Prev 1 2 3 10 Next ›