Related papers: Quantifying French Document Complexity
An evaluation metric is an absolute necessity for measuring the performance of any system and complexity of any data. In this paper, we have discussed how to determine the level of complexity of code-mixed social media texts that are…
We leverage generative large language models for language learning applications, focusing on estimating the difficulty of foreign language texts and simplifying them to lower difficulty levels. We frame both tasks as prediction problems and…
The goal of this work is to build a classifier that can identify text complexity within the context of teaching reading to English as a Second Language (ESL) learners. To present language learners with texts that are suitable to their level…
We present an analysis of eight measures used for quantifying morphological complexity of natural languages. The measures we study are corpus-based measures of morphological complexity with varying requirements for corpus annotation. We…
Complexity is a multi-faceted phenomenon, involving a variety of features including disorder, nonlinearity, and self-organisation. We use a recently developed rigorous framework for complexity to understand measures of complexity. We…
Corpora and web texts can become a rich language learning resource if we have a means of assessing whether they are linguistically appropriate for learners at a given proficiency level. In this paper, we aim at addressing this issue by…
Text similarity detection aims at measuring the degree of similarity between a pair of texts. Corpora available for text similarity detection are designed to evaluate the algorithms to assess the paraphrase level among documents. In this…
Software implements a significant proportion of functionality in factory automation. Thus, efficient development and the reuse of software parts, so-called units, enhance competitiveness. Thereby, complex control software units are more…
We use large language models to aid learners enhance proficiency in a foreign language. This is accomplished by identifying content on topics that the user is interested in, and that closely align with the learner's proficiency level in…
This paper analyses the contribution of language metrics and, potentially, of linguistic structures, to classify French learners of English according to levels of the Common European Framework of Reference for Languages (CEFRL). The purpose…
Written language is complex. A written text can be considered an attempt to convey a meaningful message which ends up being constrained by language rules, context dependence and highly redundant in its use of resources. Despite all these…
A multitude of factors are responsible for the overall quality of scientific papers, including readability, linguistic quality, fluency,semantic complexity, and of course domain-specific technical factors. These factors vary from one field…
We propose a new similarity measure between texts which, contrary to the current state-of-the-art approaches, takes a global view of the texts to be compared. We have implemented a tool to compute our textual distance and conducted…
The ability to compare the semantic similarity between text corpora is important in a variety of natural language processing applications. However, standard methods for evaluating these metrics have yet to be established. We propose a set…
We compared entropy for texts written in natural languages (English, Spanish) and artificial languages (computer software) based on a simple expression for the entropy as a function of message length and specific word diversity. Code text…
Existing methods for complexity estimation are typically developed for entire documents. This limitation in scope makes them inapplicable for shorter pieces of text, such as health assessment tools. These typically consist of lists of…
Cross-Language Information Retrieval (CLIR) and machine translation (MT) resources, such as dictionaries and parallel corpora, are scarce and hard to come by for special domains. Besides, these resources are just limited to a few languages,…
This article presents an automatic frame analysis system evaluated on a corpus of French encyclopedic history texts annotated according to the FrameNet formalism. The chosen approach relies on an integrated sequence labeling model which…
Spreadsheets are widely used in industry, even for critical business processes. This implies the need for proper risk assessment in spreadsheets to evaluate the reliability and validity of the spreadsheet's outcome. As related research has…
The diversity across outputs generated by LLMs shapes perception of their quality and utility. High lexical diversity is often desirable, but there is no standard method to measure this property. Templated answer structures and ``canned''…