Related papers: Phonotactic Complexity across Dialects

Phonotactic Complexity and its Trade-offs

We present methods for calculating a measure of phonotactic complexity---bits per phoneme---that permits a straightforward cross-linguistic comparison. When given a word, represented as a sequence of phonemic segments such as symbols in the…

Computation and Language · Computer Science 2020-05-11 Tiago Pimentel , Brian Roark , Ryan Cotterell

Correlation Does Not Imply Compensation: Complexity and Irregularity in the Lexicon

It has been claimed that within a language, morphologically irregular words are more likely to be phonotactically simple and morphologically regular words are more likely to be phonotactically complex. This inverse correlation has been…

Computation and Language · Computer Science 2024-06-11 Amanda Doucette , Ryan Cotterell , Morgan Sonderegger , Timothy J. O'Donnell

Syntactic Variation Across the Grammar: Modelling a Complex Adaptive System

While language is a complex adaptive system, most work on syntactic variation observes a few individual constructions in isolation from the rest of the grammar. This means that the grammar, a network which connects thousands of structures…

Computation and Language · Computer Science 2023-09-22 Jonathan Dunn

Revisiting the Phenomenon of Syntactic Complexity Convergence on German Dialogue Data

We revisit the phenomenon of syntactic complexity convergence in conversational interaction, originally found for English dialogue, which has theoretical implication for dialogical concepts such as mutual understanding. We use a modified…

Computation and Language · Computer Science 2024-08-23 Yu Wang , Hendrik Buschmeier

Quantifying Language Variation Acoustically with Few Resources

Deep acoustic models represent linguistic information based on massive amounts of data. Unfortunately, for regional languages and dialects such resources are mostly not available. However, deep acoustic models might have learned linguistic…

Computation and Language · Computer Science 2022-05-26 Martijn Bartelds , Martijn Wieling

Global Syntactic Variation in Seven Languages: Towards a Computational Dialectology

The goal of this paper is to provide a complete representation of regional linguistic variation on a global scale. To this end, the paper focuses on removing three constraints that have previously limited work within…

Computation and Language · Computer Science 2021-04-06 Jonathan Dunn

Extracting and Learning a Dependency-Enhanced Type Lexicon for Dutch

This thesis is concerned with type-logical grammars and their practical applicability as tools of reasoning about sentence syntax and semantics. The focal point is narrowed to Dutch, a language exhibiting a large degree of word order…

Computation and Language · Computer Science 2019-09-11 Konstantinos Kogkalidis

Estimating the Level and Direction of Phonetic Dialect Change in the Northern Netherlands

This article reports ongoing investigations into phonetic change of dialect groups in the northern Netherlandic language area, particularly the Frisian and Low Saxon dialect groups, which are known to differ in vitality. To achieve this, we…

Computation and Language · Computer Science 2021-10-18 Raoul Buurke , Hedwig Sekeres , Wilbert Heeringa , Remco Knooihuizen , Martijn Wieling

Quantifying the Dialect Gap and its Correlates Across Languages

Historically, researchers and consumers have noticed a decrease in quality when applying NLP tools to minority variants of languages (i.e. Puerto Rican Spanish or Swiss German), but studies exploring this have been limited to a select few…

Computation and Language · Computer Science 2023-10-24 Anjali Kantharuban , Ivan Vulić , Anna Korhonen

The Impact of Depth on Compositional Generalization in Transformer Language Models

To process novel sentences, language models (LMs) must generalize compositionally -- combine familiar elements in new ways. What aspects of a model's structure promote compositional generalization? Focusing on transformers, we test the…

Computation and Language · Computer Science 2024-04-12 Jackson Petty , Sjoerd van Steenkiste , Ishita Dasgupta , Fei Sha , Dan Garrette , Tal Linzen

Dialectal Toxicity Detection: Evaluating LLM-as-a-Judge Consistency Across Language Varieties

There has been little systematic study on how dialectal differences affect toxicity detection by modern LLMs. Furthermore, although using LLMs as evaluators ("LLM-as-a-judge") is a growing research area, their sensitivity to dialectal…

Computation and Language · Computer Science 2024-11-19 Fahim Faisal , Md Mushfiqur Rahman , Antonios Anastasopoulos

Compressing Word Embeddings Using Syllables

This work examines the possibility of using syllable embeddings, instead of the often used $n$-gram embeddings, as subword embeddings. We investigate this for two languages: English and Dutch. To this end, we also translated two standard…

Computation and Language · Computer Science 2022-01-14 Laurent Mertens , Joost Vennekens

Language Diversity across the Consonant Inventories: A Study in the Framework of Complex Networks

n this paper, we attempt to explain the emergence of the linguistic diversity that exists across the consonant inventories of some of the major language families of the world through a complex network based growth model. There is only a…

Computation and Language · Computer Science 2009-04-09 Monojit Choudhury , Animesh Mukherjee , Anupam Basu , Niloy Ganguly , Ashish Garg , Vaibhav Jalan

Are All Languages Equally Hard to Language-Model?

For general modeling methods applied to diverse languages, a natural question is: how well should we expect our models to work on languages with differing typological profiles? In this work, we develop an evaluation framework for fair…

Computation and Language · Computer Science 2020-02-26 Ryan Cotterell , Sabrina J. Mielke , Jason Eisner , Brian Roark

A Systematic Assessment of Syntactic Generalization in Neural Language Models

While state-of-the-art neural network models continue to achieve lower perplexity scores on language modeling benchmarks, it remains unknown whether optimizing for broad-coverage predictive performance leads to human-like syntactic…

Computation and Language · Computer Science 2020-05-26 Jennifer Hu , Jon Gauthier , Peng Qian , Ethan Wilcox , Roger P. Levy

Unsupervised Classification of English Words Based on Phonological Information: Discovery of Germanic and Latinate Clusters

Cross-linguistically, native words and loanwords follow different phonological rules. In English, for example, words of Germanic and Latinate origin exhibit different stress patterns, and a certain syntactic structure, double-object…

Computation and Language · Computer Science 2026-02-09 Takashi Morita , Timothy J. O'Donnell

A Layered Grammar Model: Using Tree-Adjoining Grammars to Build a Common Syntactic Kernel for Related Dialects

This article describes the design of a common syntactic description for the core grammar of a group of related dialects. The common description does not rely on an abstract sub-linguistic structure like a metagrammar: it consists in a…

Computation and Language · Computer Science 2008-10-08 Pascal Vaillant

DialectGram: Detecting Dialectal Variation at Multiple Geographic Resolutions

Several computational models have been developed to detect and analyze dialect variation in recent years. Most of these models assume a predefined set of geographical regions over which they detect and analyze dialectal variation. However,…

Computation and Language · Computer Science 2019-10-17 Hang Jiang , Haoshen Hong , Yuxing Chen , Vivek Kulkarni

No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models

For decades, context-dependent phonemes have been the dominant sub-word unit for conventional acoustic modeling systems. This status quo has begun to be challenged recently by end-to-end models which seek to combine acoustic, pronunciation,…

Computation and Language · Computer Science 2017-12-07 Tara N. Sainath , Rohit Prabhavalkar , Shankar Kumar , Seungji Lee , Anjuli Kannan , David Rybach , Vlad Schogol , Patrick Nguyen , Bo Li , Yonghui Wu , Zhifeng Chen , Chung-Cheng Chiu

From Phonology to Syntax: Unsupervised Linguistic Typology at Different Levels with Language Embeddings

A core part of linguistic typology is the classification of languages according to linguistic properties, such as those detailed in the World Atlas of Language Structure (WALS). Doing this manually is prohibitively time-consuming, which is…

Computation and Language · Computer Science 2018-02-27 Johannes Bjerva , Isabelle Augenstein