English
Related papers

Related papers: Phonotactic Complexity across Dialects

200 papers

We present methods for calculating a measure of phonotactic complexity---bits per phoneme---that permits a straightforward cross-linguistic comparison. When given a word, represented as a sequence of phonemic segments such as symbols in the…

Computation and Language · Computer Science 2020-05-11 Tiago Pimentel , Brian Roark , Ryan Cotterell

It has been claimed that within a language, morphologically irregular words are more likely to be phonotactically simple and morphologically regular words are more likely to be phonotactically complex. This inverse correlation has been…

Computation and Language · Computer Science 2024-06-11 Amanda Doucette , Ryan Cotterell , Morgan Sonderegger , Timothy J. O'Donnell

While language is a complex adaptive system, most work on syntactic variation observes a few individual constructions in isolation from the rest of the grammar. This means that the grammar, a network which connects thousands of structures…

Computation and Language · Computer Science 2023-09-22 Jonathan Dunn

We revisit the phenomenon of syntactic complexity convergence in conversational interaction, originally found for English dialogue, which has theoretical implication for dialogical concepts such as mutual understanding. We use a modified…

Computation and Language · Computer Science 2024-08-23 Yu Wang , Hendrik Buschmeier

Deep acoustic models represent linguistic information based on massive amounts of data. Unfortunately, for regional languages and dialects such resources are mostly not available. However, deep acoustic models might have learned linguistic…

Computation and Language · Computer Science 2022-05-26 Martijn Bartelds , Martijn Wieling

The goal of this paper is to provide a complete representation of regional linguistic variation on a global scale. To this end, the paper focuses on removing three constraints that have previously limited work within…

Computation and Language · Computer Science 2021-04-06 Jonathan Dunn

This thesis is concerned with type-logical grammars and their practical applicability as tools of reasoning about sentence syntax and semantics. The focal point is narrowed to Dutch, a language exhibiting a large degree of word order…

Computation and Language · Computer Science 2019-09-11 Konstantinos Kogkalidis

This article reports ongoing investigations into phonetic change of dialect groups in the northern Netherlandic language area, particularly the Frisian and Low Saxon dialect groups, which are known to differ in vitality. To achieve this, we…

Computation and Language · Computer Science 2021-10-18 Raoul Buurke , Hedwig Sekeres , Wilbert Heeringa , Remco Knooihuizen , Martijn Wieling

Historically, researchers and consumers have noticed a decrease in quality when applying NLP tools to minority variants of languages (i.e. Puerto Rican Spanish or Swiss German), but studies exploring this have been limited to a select few…

Computation and Language · Computer Science 2023-10-24 Anjali Kantharuban , Ivan Vulić , Anna Korhonen

To process novel sentences, language models (LMs) must generalize compositionally -- combine familiar elements in new ways. What aspects of a model's structure promote compositional generalization? Focusing on transformers, we test the…

Computation and Language · Computer Science 2024-04-12 Jackson Petty , Sjoerd van Steenkiste , Ishita Dasgupta , Fei Sha , Dan Garrette , Tal Linzen

There has been little systematic study on how dialectal differences affect toxicity detection by modern LLMs. Furthermore, although using LLMs as evaluators ("LLM-as-a-judge") is a growing research area, their sensitivity to dialectal…

Computation and Language · Computer Science 2024-11-19 Fahim Faisal , Md Mushfiqur Rahman , Antonios Anastasopoulos

This work examines the possibility of using syllable embeddings, instead of the often used $n$-gram embeddings, as subword embeddings. We investigate this for two languages: English and Dutch. To this end, we also translated two standard…

Computation and Language · Computer Science 2022-01-14 Laurent Mertens , Joost Vennekens

n this paper, we attempt to explain the emergence of the linguistic diversity that exists across the consonant inventories of some of the major language families of the world through a complex network based growth model. There is only a…

Computation and Language · Computer Science 2009-04-09 Monojit Choudhury , Animesh Mukherjee , Anupam Basu , Niloy Ganguly , Ashish Garg , Vaibhav Jalan

For general modeling methods applied to diverse languages, a natural question is: how well should we expect our models to work on languages with differing typological profiles? In this work, we develop an evaluation framework for fair…

Computation and Language · Computer Science 2020-02-26 Ryan Cotterell , Sabrina J. Mielke , Jason Eisner , Brian Roark

While state-of-the-art neural network models continue to achieve lower perplexity scores on language modeling benchmarks, it remains unknown whether optimizing for broad-coverage predictive performance leads to human-like syntactic…

Computation and Language · Computer Science 2020-05-26 Jennifer Hu , Jon Gauthier , Peng Qian , Ethan Wilcox , Roger P. Levy

Cross-linguistically, native words and loanwords follow different phonological rules. In English, for example, words of Germanic and Latinate origin exhibit different stress patterns, and a certain syntactic structure, double-object…

Computation and Language · Computer Science 2026-02-09 Takashi Morita , Timothy J. O'Donnell

This article describes the design of a common syntactic description for the core grammar of a group of related dialects. The common description does not rely on an abstract sub-linguistic structure like a metagrammar: it consists in a…

Computation and Language · Computer Science 2008-10-08 Pascal Vaillant

Several computational models have been developed to detect and analyze dialect variation in recent years. Most of these models assume a predefined set of geographical regions over which they detect and analyze dialectal variation. However,…

Computation and Language · Computer Science 2019-10-17 Hang Jiang , Haoshen Hong , Yuxing Chen , Vivek Kulkarni

For decades, context-dependent phonemes have been the dominant sub-word unit for conventional acoustic modeling systems. This status quo has begun to be challenged recently by end-to-end models which seek to combine acoustic, pronunciation,…

A core part of linguistic typology is the classification of languages according to linguistic properties, such as those detailed in the World Atlas of Language Structure (WALS). Doing this manually is prohibitively time-consuming, which is…

Computation and Language · Computer Science 2018-02-27 Johannes Bjerva , Isabelle Augenstein
‹ Prev 1 2 3 10 Next ›