Related papers: Automatic Question-Answer Generation for Long-Tail…

Automatic Dataset Generation for Knowledge Intensive Question Answering Tasks

A question-answering (QA) system is to search suitable answers within a knowledge base. Current QA systems struggle with queries requiring complex reasoning or real-time knowledge integration. They are often supplemented with retrieval…

Computation and Language · Computer Science 2025-05-21 Sizhe Yuen , Ting Su , Ziyang Wang , Yali Du , Adam J. Sobey

Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities

Large language models (LLMs) have demonstrated remarkable performance on question-answering (QA) tasks because of their superior capabilities in natural language understanding and generation. However, LLM-based QA struggles with complex QA…

Computation and Language · Computer Science 2025-09-23 Chuangtao Ma , Yongrui Chen , Tianxing Wu , Arijit Khan , Haofen Wang

TailNLG: A Multilingual Benchmark Addressing Verbalization of Long-Tail Entities

The automatic verbalization of structured knowledge is a key task for making knowledge graphs accessible to non-expert users and supporting retrieval-augmented generation systems. Although recent advances in Data-to-Text generation have…

Computation and Language · Computer Science 2026-03-31 Lia Draetta , Michael Oliverio , Virginia Ramón-Ferrer , Pier Felice Balestrucci , Flaviana Corallo , Carlos Badenes-Olmedo , Alessandro Mazzei , Marco Antonio Stranisci , Rossana Damiano

Large Language Models Struggle to Learn Long-Tail Knowledge

The Internet contains a wealth of knowledge -- from the birthdays of historical figures to tutorials on how to code -- all of which may be learned by language models. However, while certain pieces of information are ubiquitous on the web,…

Computation and Language · Computer Science 2023-07-28 Nikhil Kandpal , Haikang Deng , Adam Roberts , Eric Wallace , Colin Raffel

DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text

Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when solely relying on their internal knowledge, especially when answering questions that require less commonly known…

Computation and Language · Computer Science 2023-11-01 Wenting Zhao , Ye Liu , Tong Niu , Yao Wan , Philip S. Yu , Shafiq Joty , Yingbo Zhou , Semih Yavuz

Knowledge Base Completion for Long-Tail Entities

Despite their impressive scale, knowledge bases (KBs), such as Wikidata, still contain significant gaps. Language models (LMs) have been proposed as a source for filling these gaps. However, prior works have focused on prominent entities…

Computation and Language · Computer Science 2023-07-03 Lihu Chen , Simon Razniewski , Gerhard Weikum

CR-LT-KGQA: A Knowledge Graph Question Answering Dataset Requiring Commonsense Reasoning and Long-Tail Knowledge

Knowledge graph question answering (KGQA) is a well-established field that seeks to provide factual answers to natural language (NL) questions by leveraging knowledge graphs (KGs). However, existing KGQA datasets suffer from two significant…

Computation and Language · Computer Science 2024-03-05 Willis Guo , Armin Toroghi , Scott Sanner

Context Generation Improves Open Domain Question Answering

Closed-book question answering (QA) requires a model to directly answer an open-domain question without access to any external knowledge. Prior work on closed-book QA either directly finetunes or prompts a pretrained language model (LM) to…

Computation and Language · Computer Science 2023-04-28 Dan Su , Mostofa Patwary , Shrimai Prabhumoye , Peng Xu , Ryan Prenger , Mohammad Shoeybi , Pascale Fung , Anima Anandkumar , Bryan Catanzaro

Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation

We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books. Previous efforts to construct such datasets relied on crowd-sourcing, but the emergence of…

Computation and Language · Computer Science 2024-06-04 Bernd Bohnet , Kevin Swersky , Rosanne Liu , Pranjal Awasthi , Azade Nova , Javier Snaider , Hanie Sedghi , Aaron T Parisi , Michael Collins , Angeliki Lazaridou , Orhan Firat , Noah Fiedel

In Search of the Long-Tail: Systematic Generation of Long-Tail Inferential Knowledge via Logical Rule Guided Search

To effectively use large language models (LLMs) for real-world queries, it is imperative that they generalize to the long-tail distribution, i.e. rare examples where models exhibit low confidence. In this work, we take the first step…

Computation and Language · Computer Science 2024-10-07 Huihan Li , Yuting Ning , Zeyi Liao , Siyuan Wang , Xiang Lorraine Li , Ximing Lu , Wenting Zhao , Faeze Brahman , Yejin Choi , Xiang Ren

Long-Tailed Question Answering in an Open World

Real-world data often have an open long-tailed distribution, and building a unified QA model supporting various tasks is vital for practical QA applications. However, it is non-trivial to extend previous QA approaches since they either…

Computation and Language · Computer Science 2023-05-12 Yi Dai , Hao Lang , Yinhe Zheng , Fei Huang , Yongbin Li

iTRI-QA: a Toolset for Customized Question-Answer Dataset Generation Using Language Models for Enhanced Scientific Research

The exponential growth of AI in science necessitates efficient and scalable solutions for retrieving and preserving research information. Here, we present a tool for the development of a customized question-answer (QA) dataset, called…

Information Retrieval · Computer Science 2025-02-25 Qiming Liu , Zhongzheng Niu , Siting Liu , Mao Tian

Investigating Answerability of LLMs for Long-Form Question Answering

As we embark on a new era of LLMs, it becomes increasingly crucial to understand their capabilities, limitations, and differences. Toward making further progress in this direction, we strive to build a deeper understanding of the gaps…

Computation and Language · Computer Science 2023-09-18 Meghana Moorthy Bhat , Rui Meng , Ye Liu , Yingbo Zhou , Semih Yavuz

All Entities are Not Created Equal: Examining the Long Tail for Ultra-Fine Entity Typing

Due to their capacity to acquire world knowledge from large corpora, pre-trained language models (PLMs) are extensively used in ultra-fine entity typing tasks where the space of labels is extremely large. In this work, we explore the…

Computation and Language · Computer Science 2026-04-28 Advait Deshmukh , Ashwin Umadi , Dananjay Srinivas , Maria Leonor Pacheco

Automated Generation of Massive Reasonable Empirical Theorems by Forward Reasoning Based on Strong Relevant Logics -- A Solution to the Problem of LLM Pre-training Data Exhaustion

Recently, it is often said that the data used for the pre-training of large language models (LLMs) have been exhausted. This paper proposes a solution to the problem: Automated generation of massive reasonable empirical theorems by forward…

Artificial Intelligence · Computer Science 2024-12-18 Jingde Cheng

A Survey of Large Language Model Agents for Question Answering

This paper surveys the development of large language model (LLM)-based agents for question answering (QA). Traditional agents face significant limitations, including substantial data requirements and difficulty in generalizing to new…

Computation and Language · Computer Science 2025-03-26 Murong Yue

For those who don't know (how) to ask: Building a dataset of technology questions for digital newcomers

While the rise of large language models (LLMs) has created rich new opportunities to learn about digital technology, many on the margins of this technology struggle to gain and maintain competency due to lexical or conceptual barriers that…

Computation and Language · Computer Science 2024-03-28 Evan Lucas , Kelly S. Steelman , Leo C. Ureel , Charles Wallace

Open Knowledge Enrichment for Long-tail Entities

Knowledge bases (KBs) have gradually become a valuable asset for many AI applications. While many current KBs are quite large, they are widely acknowledged as incomplete, especially lacking facts of long-tail entities, e.g., less famous…

Information Retrieval · Computer Science 2020-02-20 Ermei Cao , Difeng Wang , Jiacheng Huang , Wei Hu

CoLoTa: A Dataset for Entity-based Commonsense Reasoning over Long-Tail Knowledge

The rise of Large Language Models (LLMs) has redefined the AI landscape, particularly due to their ability to encode factual and commonsense knowledge, and their outstanding performance in tasks requiring reasoning. Despite these advances,…

Computation and Language · Computer Science 2025-04-22 Armin Toroghi , Willis Guo , Scott Sanner

On the Role of Long-tail Knowledge in Retrieval Augmented Large Language Models

Retrieval augmented generation (RAG) exhibits outstanding performance in promoting the knowledge capabilities of large language models (LLMs) with retrieved documents related to user queries. However, RAG only focuses on improving the…

Information Retrieval · Computer Science 2024-06-25 Dongyang Li , Junbing Yan , Taolin Zhang , Chengyu Wang , Xiaofeng He , Longtao Huang , Hui Xue , Jun Huang