Related papers: Framework and Resources for Natural Language Parse…

A Pragmatics-Centered Evaluation Framework for Natural Language Understanding

New models for natural language understanding have recently made an unparalleled amount of progress, which has led some researchers to suggest that the models induce universal text representations. However, current benchmarks are…

Computation and Language · Computer Science 2022-04-05 Damien Sileo , Tim Van-de-Cruys , Camille Pradel , Philippe Muller

The Science of Evaluating Foundation Models

The emergent phenomena of large foundation models have revolutionized natural language processing. However, evaluating these models presents significant challenges due to their size, capabilities, and deployment across diverse applications.…

Computation and Language · Computer Science 2025-02-17 Jiayi Yuan , Jiamu Zhang , Andrew Wen , Xia Hu

A novel evaluation methodology for supervised Feature Ranking algorithms

Both in the domains of Feature Selection and Interpretable AI, there exists a desire to `rank' features based on their importance. Such feature importance rankings can then be used to either: (1) reduce the dataset size or (2) interpret the…

Machine Learning · Computer Science 2022-07-12 Jeroen G. S. Overschie

A Pragmatic Guide to Geoparsing Evaluation

Empirical methods in geoparsing have thus far lacked a standard evaluation framework describing the task, metrics and data used to compare state-of-the-art systems. Evaluation is further made inconsistent, even unrepresentative of…

Computation and Language · Computer Science 2019-09-17 Milan Gritta , Mohammad Taher Pilehvar , Nigel Collier

KPEval: Towards Fine-Grained Semantic-Based Keyphrase Evaluation

Despite the significant advancements in keyphrase extraction and keyphrase generation methods, the predominant approach for evaluation mainly relies on exact matching with human references. This scheme fails to recognize systems that…

Computation and Language · Computer Science 2024-06-05 Di Wu , Da Yin , Kai-Wei Chang

A Framework for Processing Textual Descriptions of Business Processes using a Constrained Language -- Technical Report

This report explores how (potentially constrained) natural language can be used to enable non-experts to develop process models by simply describing scenarios in plain text. To this end, a framework, called BeePath, is proposed. It allows…

Computation and Language · Computer Science 2025-08-25 Andrea Burattin , Antonio Grama , Ana-Maria Sima , Andrey Rivkin , Barbara Weber

Multilingual Self-Taught Faithfulness Evaluators

The growing use of large language models (LLMs) has increased the need for automatic evaluation systems, particularly to address the challenge of information hallucination. Although existing faithfulness evaluation approaches have shown…

Computation and Language · Computer Science 2025-07-29 Carlo Alfano , Aymen Al Marjani , Zeno Jonke , Amin Mantrach , Saab Mansour , Marcello Federico

A Survey on Recognizing Textual Entailment as an NLP Evaluation

Recognizing Textual Entailment (RTE) was proposed as a unified evaluation framework to compare semantic understanding of different NLP systems. In this survey paper, we provide an overview of different approaches for evaluating and…

Computation and Language · Computer Science 2020-10-08 Adam Poliak

A Principled Framework for Evaluating on Typologically Diverse Languages

Beyond individual languages, multilingual natural language processing (NLP) research increasingly aims to develop models that perform well across languages generally. However, evaluating these systems on all the world's languages is…

Computation and Language · Computer Science 2025-09-09 Esther Ploeger , Wessel Poelman , Andreas Holck Høeg-Petersen , Anders Schlichtkrull , Miryam de Lhoneux , Johannes Bjerva

Adversarial Evaluation for Models of Natural Language

We now have a rich and growing set of modeling tools and algorithms for inducing linguistic structure from text that is less than fully annotated. In this paper, we discuss some of the weaknesses of our current methodology. We present a new…

Computation and Language · Computer Science 2012-07-17 Noah A. Smith

Assessing Semantic Frames to Support Program Comprehension Activities

Software developers often rely on natural language text that appears in software engineering artifacts to access critical information as they build and work on software systems. For example, developers access requirements documents to…

Software Engineering · Computer Science 2021-05-14 Arthur Marques , Giovanni Viviani , Gail C. Murphy

Eka-Eval: An Evaluation Framework for Low-Resource Multilingual Large Language Models

The rapid evolution of Large Language Models' has underscored the need for evaluation frameworks that are globally applicable, flexible, and modular, and that support a wide range of tasks, model types, and linguistic settings. We introduce…

Computation and Language · Computer Science 2026-03-06 Samridhi Raj Sinha , Rajvee Sheth , Abhishek Upperwal , Mayank Singh

Hierarchical Evaluation Framework: Best Practices for Human Evaluation

Human evaluation plays a crucial role in Natural Language Processing (NLP) as it assesses the quality and relevance of developed systems, thereby facilitating their enhancement. However, the absence of widely accepted human evaluation…

Computation and Language · Computer Science 2023-10-13 Iva Bojic , Jessica Chen , Si Yuan Chang , Qi Chwen Ong , Shafiq Joty , Josip Car

SteerEval: A Framework for Evaluating Steerability with Natural Language Profiles for Recommendation

Natural-language user profiles have recently attracted attention not only for improved interpretability, but also for their potential to make recommender systems more steerable. By enabling direct editing, natural-language profiles allow…

Information Retrieval · Computer Science 2026-01-30 Joyce Zhou , Weijie Zhou , Doug Turnbull , Thorsten Joachims

LEPOR: An Augmented Machine Translation Evaluation Metric

Machine translation (MT) was developed as one of the hottest research topics in the natural language processing (NLP) literature. One important issue in MT is that how to evaluate the MT system reasonably and tell us whether the translation…

Computation and Language · Computer Science 2022-01-25 Lifeng Han

Evaluation Revisited: A Taxonomy of Evaluation Concerns in Natural Language Processing

Recent advances in large language models (LLMs) have prompted a growing body of work that questions the methodology of prevailing evaluation practices. However, many such critiques have already been extensively debated in natural language…

Computation and Language · Computer Science 2026-04-30 Ruchira Dhar , Anders Søgaard

Natural Language in Requirements Engineering for Structure Inference -- An Integrative Review

The automatic extraction of structure from text can be difficult for machines. Yet, the elicitation of this information can provide many benefits and opportunities for various applications. Benefits have also been identified for the area of…

Computation and Language · Computer Science 2022-02-11 Maximilian Vierlboeck , Carlo Lipizzi , Roshanak Nilchiani

Survey:Natural Language Parsing For Indian Languages

Syntactic parsing is a necessary task which is required for NLP applications including machine translation. It is a challenging task to develop a qualitative parser for morphological rich and agglutinative languages. Syntactic analysis is…

Computation and Language · Computer Science 2015-01-29 Monika T. Makwana , Deepak C. Vegda

A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility

Reasoning has emerged as the next major frontier for language models (LMs), with rapid advances from both academic and industrial labs. However, this progress often outpaces methodological rigor, with many evaluations relying on…

Machine Learning · Computer Science 2025-10-08 Andreas Hochlehnert , Hardik Bhatnagar , Vishaal Udandarao , Samuel Albanie , Ameya Prabhu , Matthias Bethge

Langar: An Approach to Evaluate Reo Programming Language

Reo is a formal coordination language. In order to assess and evaluate its capabilities, we need a multi-perspective Language Evaluation Framework. Langar (Language Analysis for Reo) is a framework aimed to provide such an evaluation…

Software Engineering · Computer Science 2021-03-09 Mohammad Reza Besharati , Mohammad Izadi