Related papers: The Paradigm Discovery Problem

Paradigm Completion for Derivational Morphology

The generation of complex derived word forms has been an overlooked problem in NLP; we fill this gap by applying neural sequence-to-sequence models to the task. We overview the theoretical motivation for a paradigmatic treatment of…

Computation and Language · Computer Science 2025-02-18 Ryan Cotterell , Ekaterina Vylomova , Huda Khayrallah , Christo Kirov , David Yarowsky

Minimal Supervision for Morphological Inflection

Neural models for the various flavours of morphological inflection tasks have proven to be extremely accurate given ample labeled data -- data that may be slow and costly to obtain. In this work we aim to overcome this annotation bottleneck…

Computation and Language · Computer Science 2021-10-13 Omer Goldman , Reut Tsarfaty

Unsupervised Morphological Paradigm Completion

We propose the task of unsupervised morphological paradigm completion. Given only raw text and a lemma list, the task consists of generating the morphological paradigms, i.e., all inflected forms, of the lemmas. From a natural language…

Computation and Language · Computer Science 2020-05-22 Huiming Jin , Liwei Cai , Yihui Peng , Chen Xia , Arya D. McCarthy , Katharina Kann

Discovery of Paradigm Dependencies

Missing and incorrect values often cause serious consequences. To deal with these data quality problems, a class of common employed tools are dependency rules, such as Functional Dependencies (FDs), Conditional Functional Dependencies…

Databases · Computer Science 2017-10-10 Jizhou Sun , Jianzhong Li , Hong Gao

Mining Discourse Markers for Unsupervised Sentence Representation Learning

Current state of the art systems in NLP heavily rely on manually annotated datasets, which are expensive to construct. Very little work adequately exploits unannotated data -- such as discourse markers between sentences -- mainly because of…

Computation and Language · Computer Science 2019-03-29 Damien Sileo , Tim Van-De-Cruys , Camille Pradel , Philippe Muller

Discovering topics with neural topic models built from PLSA assumptions

In this paper we present a model for unsupervised topic discovery in texts corpora. The proposed model uses documents, words, and topics lookup table embedding as neural network model parameters to build probabilities of words given topics,…

Computation and Language · Computer Science 2019-11-26 Sileye 0. Ba

On the Complexity and Typology of Inflectional Morphological Systems

We quantify the linguistic complexity of different languages' morphological systems. We verify that there is an empirical trade-off between paradigm size and irregularity: a language's inflectional paradigms may be either large in size or…

Computation and Language · Computer Science 2018-07-10 Ryan Cotterell , Christo Kirov , Mans Hulden , Jason Eisner

Contextualization of Morphological Inflection

Critical to natural language generation is the production of correctly inflected text. In this paper, we isolate the task of predicting a fully inflected sentence from its partially lemmatized version. Unlike traditional morphological…

Computation and Language · Computer Science 2019-05-07 Ekaterina Vylomova , Ryan Cotterell , Timothy Baldwin , Trevor Cohn , Jason Eisner

Research on Annotation Rules and Recognition Algorithm Based on Phrase Window

At present, most Natural Language Processing technology is based on the results of Word Segmentation for Dependency Parsing, which mainly uses an end-to-end method based on supervised learning. There are two main problems with this method:…

Computation and Language · Computer Science 2020-07-08 Guang Liu , Gang Tu , Zheng Li , Yi-Jian Liu

A Process for Topic Modelling Via Word Embeddings

This work combines algorithms based on word embeddings, dimensionality reduction, and clustering. The objective is to obtain topics from a set of unclassified texts. The algorithm to obtain the word embeddings is the BERT model, a neural…

Computation and Language · Computer Science 2023-12-08 Diego Saldaña Ulloa

DP-Parse: Finding Word Boundaries from Raw Speech with an Instance Lexicon

Finding word boundaries in continuous speech is challenging as there is little or no equivalent of a 'space' delimiter between words. Popular Bayesian non-parametric models for text segmentation use a Dirichlet process to jointly segment…

Computation and Language · Computer Science 2022-06-24 Robin Algayres , Tristan Ricoul , Julien Karadayi , Hugo Laurençon , Salah Zaiem , Abdelrahman Mohamed , Benoît Sagot , Emmanuel Dupoux

Unwritten Languages Demand Attention Too! Word Discovery with Encoder-Decoder Models

Word discovery is the task of extracting words from unsegmented text. In this paper we examine to what extent neural networks can be applied to this task in a realistic unwritten language scenario, where only small corpora and limited…

Computation and Language · Computer Science 2017-09-20 Marcely Zanon Boito , Alexandre Berard , Aline Villavicencio , Laurent Besacier

PDP: A General Neural Framework for Learning Constraint Satisfaction Solvers

There have been recent efforts for incorporating Graph Neural Network models for learning full-stack solvers for constraint satisfaction problems (CSP) and particularly Boolean satisfiability (SAT). Despite the unique representational power…

Machine Learning · Computer Science 2019-03-06 Saeed Amizadeh , Sergiy Matusevych , Markus Weimer

Research on multi-dimensional end-to-end phrase recognition algorithm based on background knowledge

At present, the deep end-to-end method based on supervised learning is used in entity recognition and dependency analysis. There are two problems in this method: firstly, background knowledge cannot be introduced; secondly, multi…

Computation and Language · Computer Science 2020-07-09 Zheng Li , Gang Tu , Guang Liu , Zhi-Qiang Zhan , Yi-Jian Liu

An efficient framework for learning sentence representations

In this work we propose a simple and efficient framework for learning sentence representations from unlabelled data. Drawing inspiration from the distributional hypothesis and recent work on learning sentence representations, we reformulate…

Computation and Language · Computer Science 2018-03-09 Lajanugen Logeswaran , Honglak Lee

IntenDD: A Unified Contrastive Learning Approach for Intent Detection and Discovery

Identifying intents from dialogue utterances forms an integral component of task-oriented dialogue systems. Intent-related tasks are typically formulated either as a classification task, where the utterances are classified into predefined…

Computation and Language · Computer Science 2023-10-26 Bhavuk Singhal , Ashim Gupta , Shivasankaran V P , Amrith Krishna

LINSPECTOR: Multilingual Probing Tasks for Word Representations

Despite an ever growing number of word representation models introduced for a large number of languages, there is a lack of a standardized technique to provide insights into what is captured by these models. Such insights would help the…

Computation and Language · Computer Science 2019-12-12 Gözde Gül Şahin , Clara Vania , Ilia Kuznetsov , Iryna Gurevych

Improving Neural Language Models by Segmenting, Attending, and Predicting the Future

Common language models typically predict the next word given the context. In this work, we propose a method that improves language modeling by learning to align the given context and the following phrase. The model does not require any…

Computation and Language · Computer Science 2019-06-06 Hongyin Luo , Lan Jiang , Yonatan Belinkov , James Glass

Extending Multi-Sense Word Embedding to Phrases and Sentences for Unsupervised Semantic Applications

Most unsupervised NLP models represent each word with a single point or single region in semantic space, while the existing multi-sense word embeddings cannot represent longer word sequences like phrases or sentences. We propose a novel…

Computation and Language · Computer Science 2021-12-30 Haw-Shiuan Chang , Amol Agrawal , Andrew McCallum

Topics in the Haystack: Extracting and Evaluating Topics beyond Coherence

Extracting and identifying latent topics in large text corpora has gained increasing importance in Natural Language Processing (NLP). Most models, whether probabilistic models similar to Latent Dirichlet Allocation (LDA) or neural topic…

Computation and Language · Computer Science 2023-03-31 Anton Thielmann , Quentin Seifert , Arik Reuter , Elisabeth Bergherr , Benjamin Säfken