English

Automatic Question-Answer Generation for Long-Tail Knowledge

Computation and Language 2024-03-05 v1

Abstract

Pretrained Large Language Models (LLMs) have gained significant attention for addressing open-domain Question Answering (QA). While they exhibit high accuracy in answering questions related to common knowledge, LLMs encounter difficulties in learning about uncommon long-tail knowledge (tail entities). Since manually constructing QA datasets demands substantial human resources, the types of existing QA datasets are limited, leaving us with a scarcity of datasets to study the performance of LLMs on tail entities. In this paper, we propose an automatic approach to generate specialized QA datasets for tail entities and present the associated research challenges. We conduct extensive experiments by employing pretrained LLMs on our newly generated long-tail QA datasets, comparing their performance with and without external resources including Wikipedia and Wikidata knowledge graphs.

Keywords

Cite

@article{arxiv.2403.01382,
  title  = {Automatic Question-Answer Generation for Long-Tail Knowledge},
  author = {Rohan Kumar and Youngmin Kim and Sunitha Ravi and Haitian Sun and Christos Faloutsos and Ruslan Salakhutdinov and Minji Yoon},
  journal= {arXiv preprint arXiv:2403.01382},
  year   = {2024}
}

Comments

Accepted at KDD 2023 KnowledgeNLP

R2 v1 2026-06-28T15:07:21.999Z