Related papers: Framework and Resources for Natural Language Parse…
New models for natural language understanding have recently made an unparalleled amount of progress, which has led some researchers to suggest that the models induce universal text representations. However, current benchmarks are…
The emergent phenomena of large foundation models have revolutionized natural language processing. However, evaluating these models presents significant challenges due to their size, capabilities, and deployment across diverse applications.…
Both in the domains of Feature Selection and Interpretable AI, there exists a desire to `rank' features based on their importance. Such feature importance rankings can then be used to either: (1) reduce the dataset size or (2) interpret the…
Empirical methods in geoparsing have thus far lacked a standard evaluation framework describing the task, metrics and data used to compare state-of-the-art systems. Evaluation is further made inconsistent, even unrepresentative of…
Despite the significant advancements in keyphrase extraction and keyphrase generation methods, the predominant approach for evaluation mainly relies on exact matching with human references. This scheme fails to recognize systems that…
This report explores how (potentially constrained) natural language can be used to enable non-experts to develop process models by simply describing scenarios in plain text. To this end, a framework, called BeePath, is proposed. It allows…
The growing use of large language models (LLMs) has increased the need for automatic evaluation systems, particularly to address the challenge of information hallucination. Although existing faithfulness evaluation approaches have shown…
Recognizing Textual Entailment (RTE) was proposed as a unified evaluation framework to compare semantic understanding of different NLP systems. In this survey paper, we provide an overview of different approaches for evaluating and…
Beyond individual languages, multilingual natural language processing (NLP) research increasingly aims to develop models that perform well across languages generally. However, evaluating these systems on all the world's languages is…
We now have a rich and growing set of modeling tools and algorithms for inducing linguistic structure from text that is less than fully annotated. In this paper, we discuss some of the weaknesses of our current methodology. We present a new…
Software developers often rely on natural language text that appears in software engineering artifacts to access critical information as they build and work on software systems. For example, developers access requirements documents to…
The rapid evolution of Large Language Models' has underscored the need for evaluation frameworks that are globally applicable, flexible, and modular, and that support a wide range of tasks, model types, and linguistic settings. We introduce…
Human evaluation plays a crucial role in Natural Language Processing (NLP) as it assesses the quality and relevance of developed systems, thereby facilitating their enhancement. However, the absence of widely accepted human evaluation…
Natural-language user profiles have recently attracted attention not only for improved interpretability, but also for their potential to make recommender systems more steerable. By enabling direct editing, natural-language profiles allow…
Machine translation (MT) was developed as one of the hottest research topics in the natural language processing (NLP) literature. One important issue in MT is that how to evaluate the MT system reasonably and tell us whether the translation…
Recent advances in large language models (LLMs) have prompted a growing body of work that questions the methodology of prevailing evaluation practices. However, many such critiques have already been extensively debated in natural language…
The automatic extraction of structure from text can be difficult for machines. Yet, the elicitation of this information can provide many benefits and opportunities for various applications. Benefits have also been identified for the area of…
Syntactic parsing is a necessary task which is required for NLP applications including machine translation. It is a challenging task to develop a qualitative parser for morphological rich and agglutinative languages. Syntactic analysis is…
Reasoning has emerged as the next major frontier for language models (LMs), with rapid advances from both academic and industrial labs. However, this progress often outpaces methodological rigor, with many evaluations relying on…
Reo is a formal coordination language. In order to assess and evaluate its capabilities, we need a multi-perspective Language Evaluation Framework. Langar (Language Analysis for Reo) is a framework aimed to provide such an evaluation…