English
Related papers

Related papers: Exploring and Analyzing Machine Commonsense Benchm…

200 papers

More than one hundred benchmarks have been developed to test the commonsense knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems. However, these benchmarks are often flawed and many aspects of common sense…

Artificial Intelligence · Computer Science 2023-02-24 Ernest Davis

Commonsense datasets have been well developed in Natural Language Processing, mainly through crowdsource human annotation. However, there are debates on the genuineness of commonsense reasoning benchmarks. In specific, a significant portion…

Computation and Language · Computer Science 2024-11-07 Quyet V. Do , Junze Li , Tung-Duong Vuong , Zhaowei Wang , Yangqiu Song , Xiaojuan Ma

A fundamental ability of humans is to utilize commonsense knowledge in language understanding and question answering. In recent years, many knowledge-enhanced Commonsense Question Answering (CQA) approaches have been proposed. However, it…

Computation and Language · Computer Science 2021-01-06 Ning Bian , Xianpei Han , Bo Chen , Le Sun

Programming machines with commonsense reasoning (CSR) abilities is a longstanding challenge in the Artificial Intelligence community. Current CSR benchmarks use multiple-choice (and in relatively fewer cases, generative) question-answering…

Computation and Language · Computer Science 2022-07-18 Henrique Santos , Ke Shen , Alice M. Mulvehill , Yasaman Razeghi , Deborah L. McGuinness , Mayank Kejriwal

Non-extractive commonsense QA remains a challenging AI task, as it requires systems to reason about, synthesize, and gather disparate pieces of information, in order to generate responses to queries. Recent approaches on such tasks show…

Computation and Language · Computer Science 2019-11-01 Kaixin Ma , Jonathan Francis , Quanyang Lu , Eric Nyberg , Alessandro Oltramari

Large language models (LLMs) demonstrate remarkable performance across various tasks, prompting researchers to develop diverse evaluation benchmarks. However, most benchmarks typically measure the ability of LLMs to respond to individual…

Computation and Language · Computer Science 2026-01-30 Yutao Hou , Yajing Luo , Zhiwen Ruan , Hongru Wang , Weifeng Ge , Yun Chen , Guanhua Chen

As Large Language Models (LLMs) advance, their potential for widespread societal impact grows simultaneously. Hence, rigorous LLM evaluations are both a technical necessity and social imperative. While numerous evaluation benchmarks have…

Computation and Language · Computer Science 2025-04-22 Jaime Raldua Veuthey , Zainab Ali Majid , Suhas Hariharan , Jacob Haimes

It is very challenging to curate a dataset for language-specific knowledge and common sense in order to evaluate natural language understanding capabilities of language models. Due to the limitation in the availability of annotators, most…

Computation and Language · Computer Science 2024-06-07 Yusuke Sakai , Hidetaka Kamigaito , Taro Watanabe

Modern language models (LMs) pose a new challenge in capability assessment. Static benchmarks inevitably saturate without providing confidence in the deployment tolerances of LM-based systems, but developers nonetheless claim that their…

Software Engineering · Computer Science 2024-07-31 Michael Saxon , Ari Holtzman , Peter West , William Yang Wang , Naomi Saphra

Commonsense reasoning is intuitive for humans but has been a long-term challenge for artificial intelligence (AI). Recent advancements in pretrained language models have shown promising results on several commonsense benchmark datasets.…

Computation and Language · Computer Science 2021-06-03 Shikhar Singh , Nuan Wen , Yu Hou , Pegah Alipoormolabashi , Te-Lin Wu , Xuezhe Ma , Nanyun Peng

As quantum computing (QC) continues to evolve in hardware and software, measuring progress in this complex and diverse field remains a challenge. To track progress, uncover bottlenecks, and evaluate community efforts, benchmarks play a…

Scientific machine learning research spans diverse domains and data modalities, yet existing benchmark efforts remain siloed and lack standardization. This makes novel and transformative applications of machine learning to critical…

When answering a question, people often draw upon their rich world knowledge in addition to the particular context. Recent work has focused primarily on answering questions given some relevant document or context, and required very little…

Computation and Language · Computer Science 2019-03-19 Alon Talmor , Jonathan Herzig , Nicholas Lourie , Jonathan Berant

Large pre-trained language models (PLMs) have led to great success on various commonsense question answering (QA) tasks in an end-to-end fashion. However, little attention has been paid to what commonsense knowledge is needed to deeply…

Computation and Language · Computer Science 2021-09-14 Gengyu Wang , Xiaochen Hou , Diyi Yang , Kathleen McKeown , Jing Huang

This position paper argues that the under-representation of social science tasks in contemporary LLM benchmarks limits advances in both LLM evaluation and social scientific inquiry. Benchmarks -- standardized tools for assessing…

Recently, there has been an increase in the number of knowledge graphs that can be only queried by experts. However, describing questions using structured queries is not straightforward for non-expert users who need to have sufficient…

Computation and Language · Computer Science 2021-05-04 Abdelghny Orogat , Isabelle Liu , Ahmed El-Roby

Commonsense question answering (QA) requires a model to grasp commonsense and factual knowledge to answer questions about world events. Many prior methods couple language modeling with knowledge graphs (KG). However, although a KG contains…

Computation and Language · Computer Science 2021-08-04 Yichong Xu , Chenguang Zhu , Ruochen Xu , Yang Liu , Michael Zeng , Xuedong Huang

The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark…

Machine Learning · Computer Science 2017-03-03 Randal S. Olson , William La Cava , Patryk Orzechowski , Ryan J. Urbanowicz , Jason H. Moore

Recent advancements in reasoning-reinforced Large Language Models (LLMs) have shown remarkable capabilities in complex reasoning tasks. However, the mechanism underlying their utilization of different human reasoning skills remains poorly…

Computation and Language · Computer Science 2025-08-15 Nghia Trung Ngo , Franck Dernoncourt , Thien Huu Nguyen

Recently, the community has achieved substantial progress on many commonsense reasoning benchmarks. However, it is still unclear what is learned from the training process: the knowledge, inference capability, or both? We argue that due to…

Computation and Language · Computer Science 2022-10-13 Hongming Zhang , Yintong Huo , Yanai Elazar , Yangqiu Song , Yoav Goldberg , Dan Roth
‹ Prev 1 2 3 10 Next ›