Related papers: CodeQA: A Question Answering Dataset for Source Co…

CS1QA: A Dataset for Assisting Code-based Question Answering in an Introductory Programming Course

We introduce CS1QA, a dataset for code-based question answering in the programming education domain. CS1QA consists of 9,237 question-answer pairs gathered from chat logs in an introductory programming class using Python, and 17,698…

Computation and Language · Computer Science 2022-10-27 Changyoon Lee , Yeon Seonwoo , Alice Oh

CoQA: A Conversational Question Answering Challenge

Humans gather information by engaging in conversations involving a series of interconnected questions and answers. For machines to assist in information gathering, it is therefore essential to enable them to answer conversational questions.…

Computation and Language · Computer Science 2019-04-02 Siva Reddy , Danqi Chen , Christopher D. Manning

SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine

We publicly release a new large-scale dataset, called SearchQA, for machine comprehension, or question-answering. Unlike recently released datasets, such as DeepMind CNN/DailyMail and SQuAD, the proposed SearchQA was constructed to reflect…

Computation and Language · Computer Science 2017-06-13 Matthew Dunn , Levent Sagun , Mike Higgins , V. Ugur Guney , Volkan Cirik , Kyunghyun Cho

RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes

Understanding and reasoning about cooking recipes is a fruitful research direction towards enabling machines to interpret procedural text. In this work, we introduce RecipeQA, a dataset for multimodal comprehension of cooking recipes. It…

Computation and Language · Computer Science 2018-09-05 Semih Yagcioglu , Aykut Erdem , Erkut Erdem , Nazli Ikizler-Cinbis

NewsQA: A Machine Comprehension Dataset

We present NewsQA, a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs. Crowdworkers supply questions and answers based on a set of over 10,000 news articles from CNN, with answers consisting of…

Computation and Language · Computer Science 2017-02-08 Adam Trischler , Tong Wang , Xingdi Yuan , Justin Harris , Alessandro Sordoni , Philip Bachman , Kaheer Suleman

MovieQA: Understanding Stories in Movies through Question-Answering

We introduce the MovieQA dataset which aims to evaluate automatic story comprehension from both video and text. The dataset consists of 14,944 questions about 408 movies with high semantic diversity. The questions range from simpler "Who"…

Computer Vision and Pattern Recognition · Computer Science 2016-09-22 Makarand Tapaswi , Yukun Zhu , Rainer Stiefelhagen , Antonio Torralba , Raquel Urtasun , Sanja Fidler

ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters

To bridge the gap between the capabilities of the state-of-the-art in factoid question answering (QA) and what users ask, we need large datasets of real user questions that capture the various question phenomena users are interested in, and…

Computation and Language · Computer Science 2019-04-11 Abdalghani Abujabal , Rishiraj Saha Roy , Mohamed Yahya , Gerhard Weikum

PolicyQA: A Reading Comprehension Dataset for Privacy Policies

Privacy policy documents are long and verbose. A question answering (QA) system can assist users in finding the information that is relevant and important to them. Prior studies in this domain frame the QA task as retrieving the most…

Computation and Language · Computer Science 2020-10-07 Wasi Uddin Ahmad , Jianfeng Chi , Yuan Tian , Kai-Wei Chang

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Existing question answering (QA) datasets fail to train QA systems to perform complex reasoning and provide explanations for answers. We introduce HotpotQA, a new dataset with 113k Wikipedia-based question-answer pairs with four key…

Computation and Language · Computer Science 2018-09-26 Zhilin Yang , Peng Qi , Saizheng Zhang , Yoshua Bengio , William W. Cohen , Ruslan Salakhutdinov , Christopher D. Manning

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

We present TriviaQA, a challenging reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence…

Computation and Language · Computer Science 2017-05-16 Mandar Joshi , Eunsol Choi , Daniel S. Weld , Luke Zettlemoyer

DocVQA: A Dataset for VQA on Document Images

We present a new dataset for Visual Question Answering (VQA) on document images called DocVQA. The dataset consists of 50,000 questions defined on 12,000+ document images. Detailed analysis of the dataset in comparison with similar datasets…

Computer Vision and Pattern Recognition · Computer Science 2021-01-06 Minesh Mathew , Dimosthenis Karatzas , C. V. Jawahar

KQA Pro: A Dataset with Explicit Compositional Programs for Complex Question Answering over Knowledge Base

Complex question answering over knowledge base (Complex KBQA) is challenging because it requires various compositional reasoning capabilities, such as multi-hop inference, attribute comparison, set operation. Existing benchmarks have some…

Computation and Language · Computer Science 2022-06-24 Shulin Cao , Jiaxin Shi , Liangming Pan , Lunyiu Nie , Yutong Xiang , Lei Hou , Juanzi Li , Bin He , Hanwang Zhang

PeerQA: A Scientific Question Answering Dataset from Peer Reviews

We present PeerQA, a real-world, scientific, document-level Question Answering (QA) dataset. PeerQA questions have been sourced from peer reviews, which contain questions that reviewers raised while thoroughly examining the scientific…

Computation and Language · Computer Science 2025-02-20 Tim Baumgärtner , Ted Briscoe , Iryna Gurevych

JaQuAD: Japanese Question Answering Dataset for Machine Reading Comprehension

Question Answering (QA) is a task in which a machine understands a given document and a question to find an answer. Despite impressive progress in the NLP area, QA is still a challenging problem, especially for non-English languages due to…

Computation and Language · Computer Science 2022-02-04 ByungHoon So , Kyuhong Byun , Kyungwon Kang , Seongjin Cho

FeTaQA: Free-form Table Question Answering

Existing table question answering datasets contain abundant factual questions that primarily evaluate the query and schema comprehension capability of a system, but they fail to include questions that require complex reasoning and…

Computation and Language · Computer Science 2021-04-02 Linyong Nan , Chiachun Hsieh , Ziming Mao , Xi Victoria Lin , Neha Verma , Rui Zhang , Wojciech Kryściński , Nick Schoelkopf , Riley Kong , Xiangru Tang , Murori Mutuma , Ben Rosand , Isabel Trindade , Renusree Bandaru , Jacob Cunningham , Caiming Xiong , Dragomir Radev

ParaQA: A Question Answering Dataset with Paraphrase Responses for Single-Turn Conversation

This paper presents ParaQA, a question answering (QA) dataset with multiple paraphrased responses for single-turn conversation over knowledge graphs (KG). The dataset was created using a semi-automated framework for generating diverse…

Computation and Language · Computer Science 2021-03-16 Endri Kacupaj , Barshana Banerjee , Kuldeep Singh , Jens Lehmann

MoleculeQA: A Dataset to Evaluate Factual Accuracy in Molecular Comprehension

Large language models are playing an increasingly significant role in molecular research, yet existing models often generate erroneous information, posing challenges to accurate molecular comprehension. Traditional evaluation metrics for…

Computation and Language · Computer Science 2024-03-14 Xingyu Lu , He Cao , Zijing Liu , Shengyuan Bai , Leqing Chen , Yuan Yao , Hai-Tao Zheng , Yu Li

ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers

We describe a Question Answering (QA) dataset that contains complex questions with conditional answers, i.e. the answers are only applicable when certain conditions apply. We call this dataset ConditionalQA. In addition to conditional…

Computation and Language · Computer Science 2021-10-14 Haitian Sun , William W. Cohen , Ruslan Salakhutdinov

DCQA: Document-Level Chart Question Answering towards Complex Reasoning and Common-Sense Understanding

Visually-situated languages such as charts and plots are omnipresent in real-world documents. These graphical depictions are human-readable and are often analyzed in visually-rich documents to address a variety of questions that necessitate…

Artificial Intelligence · Computer Science 2023-10-31 Anran Wu , Luwei Xiao , Xingjiao Wu , Shuwen Yang , Junjie Xu , Zisong Zhuang , Nian Xie , Cheng Jin , Liang He

SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers

Scientific literature is typically dense, requiring significant background knowledge and deep comprehension for effective engagement. We introduce SciDQA, a new dataset for reading comprehension that challenges LLMs for a deep understanding…

Computation and Language · Computer Science 2024-11-11 Shruti Singh , Nandan Sarkar , Arman Cohan