English

Human Language Modeling

Computation and Language 2023-11-10 v1 Machine Learning

Abstract

Natural language is generated by people, yet traditional language modeling views words or documents as if generated independently. Here, we propose human language modeling (HuLM), a hierarchical extension to the language modeling problem whereby a human-level exists to connect sequences of documents (e.g. social media messages) and capture the notion that human language is moderated by changing human states. We introduce, HaRT, a large-scale transformer model for the HuLM task, pre-trained on approximately 100,000 social media users, and demonstrate its effectiveness in terms of both language modeling (perplexity) for social media and fine-tuning for 4 downstream tasks spanning document- and user-levels: stance detection, sentiment classification, age estimation, and personality assessment. Results on all tasks meet or surpass the current state-of-the-art.

Keywords

Cite

@article{arxiv.2205.05128,
  title  = {Human Language Modeling},
  author = {Nikita Soni and Matthew Matero and Niranjan Balasubramanian and H. Andrew Schwartz},
  journal= {arXiv preprint arXiv:2205.05128},
  year   = {2023}
}
R2 v1 2026-06-24T11:13:35.062Z