English

Dependency-based Mixture Language Models

Computation and Language 2022-03-22 v1

Abstract

Various models have been proposed to incorporate knowledge of syntactic structures into neural language models. However, previous works have relied heavily on elaborate components for a specific language model, usually recurrent neural network (RNN), which makes themselves unwieldy in practice to fit into other neural language models, such as Transformer and GPT-2. In this paper, we introduce the Dependency-based Mixture Language Models. In detail, we first train neural language models with a novel dependency modeling objective to learn the probability distribution of future dependent tokens given context. We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention. Extensive experiments and human evaluations show that our method can be easily and effectively applied to different neural language models while improving neural text generation on various tasks.

Keywords

Cite

@article{arxiv.2203.10256,
  title  = {Dependency-based Mixture Language Models},
  author = {Zhixian Yang and Xiaojun Wan},
  journal= {arXiv preprint arXiv:2203.10256},
  year   = {2022}
}

Comments

Accepted to ACL 2022 Main Conference