Author2Vec: A Framework for Generating User Embedding
Abstract
Online forums and social media platforms provide noisy but valuable data every day. In this paper, we propose a novel end-to-end neural network-based user embedding system, Author2Vec. The model incorporates sentence representations generated by BERT (Bidirectional Encoder Representations from Transformers) with a novel unsupervised pre-training objective, authorship classification, to produce better user embedding that encodes useful user-intrinsic properties. This user embedding system was pre-trained on post data of 10k Reddit users and was analyzed and evaluated on two user classification benchmarks: depression detection and personality classification, in which the model proved to outperform traditional count-based and prediction-based methods. We substantiate that Author2Vec successfully encoded useful user attributes and the generated user embedding performs well in downstream classification tasks without further finetuning.
Keywords
Cite
@article{arxiv.2003.11627,
title = {Author2Vec: A Framework for Generating User Embedding},
author = {Xiaodong Wu and Weizhe Lin and Zhilin Wang and Elena Rastorgueva},
journal= {arXiv preprint arXiv:2003.11627},
year = {2020}
}