Related papers: Morpho-syntactic Lexicon Generation Using Graph-ba…

Unsupervised Morphological Paradigm Completion

We propose the task of unsupervised morphological paradigm completion. Given only raw text and a lemma list, the task consists of generating the morphological paradigms, i.e., all inflected forms, of the lemmas. From a natural language…

Computation and Language · Computer Science 2020-05-22 Huiming Jin , Liwei Cai , Yihui Peng , Chen Xia , Arya D. McCarthy , Katharina Kann

A Morphologically-Aware Dictionary-based Data Augmentation Technique for Machine Translation of Under-Represented Languages

The availability of parallel texts is crucial to the performance of machine translation models. However, most of the world's languages face the predominant challenge of data scarcity. In this paper, we propose strategies to synthesize…

Computation and Language · Computer Science 2024-02-06 Md Mahfuz Ibn Alam , Sina Ahmadi , Antonios Anastasopoulos

A Straightforward Approach to Morphological Analysis and Synthesis

In this paper we present a lexicon-based approach to the problem of morphological processing. Full-form words, lemmas and grammatical tags are interconnected in a DAWG. Thus, the process of analysis/synthesis is reduced to a search in the…

Computation and Language · Computer Science 2007-05-23 Kyriakos N. Sgarbas , Nikos D. Fakotakis , George K. Kokkinakis

Unsupervised Morphological Expansion of Small Datasets for Improving Word Embeddings

We present a language independent, unsupervised method for building word embeddings using morphological expansion of text. Our model handles the problem of data sparsity and yields improved word embeddings by relying on training word…

Computation and Language · Computer Science 2017-11-16 Syed Sarfaraz Akhtar , Arihant Gupta , Avijit Vajpayee , Arjit Srivastava , Manish Shrivastava

Incorporating Syntactic and Semantic Information in Word Embeddings using Graph Convolutional Networks

Word embeddings have been widely adopted across several NLP applications. Most existing word embedding methods utilize sequential context of a word to learn its embedding. While there have been some attempts at utilizing syntactic context…

Computation and Language · Computer Science 2019-07-23 Shikhar Vashishth , Manik Bhandari , Prateek Yadav , Piyush Rai , Chiranjib Bhattacharyya , Partha Talukdar

External Lexical Information for Multilingual Part-of-Speech Tagging

Morphosyntactic lexicons and word vector representations have both proven useful for improving the accuracy of statistical part-of-speech taggers. Here we compare the performances of four systems on datasets covering 16 languages, two of…

Computation and Language · Computer Science 2016-08-10 Benoît Sagot

Morphological Inflection Generation Using Character Sequence to Sequence Learning

Morphological inflection generation is the task of generating the inflected form of a given lemma corresponding to a particular linguistic transformation. We model the problem of inflection generation as a character sequence to sequence…

Computation and Language · Computer Science 2016-03-23 Manaal Faruqui , Yulia Tsvetkov , Graham Neubig , Chris Dyer

A Corpus-Based Approach for Building Semantic Lexicons

Semantic knowledge can be a great asset to natural language processing systems, but it is usually hand-coded for each application. Although some semantic information is available in general-purpose knowledge bases such as WordNet and Cyc,…

cmp-lg · Computer Science 2008-02-03 Ellen Riloff , Jessica Shepherd

Same Neurons, Different Languages: Probing Morphosyntax in Multilingual Pre-trained Models

The success of multilingual pre-trained models is underpinned by their ability to learn representations shared by multiple languages even in absence of any explicit supervision. However, it remains unclear how these models learn to…

Computation and Language · Computer Science 2022-05-10 Karolina Stańczak , Edoardo Ponti , Lucas Torroba Hennigen , Ryan Cotterell , Isabelle Augenstein

Multi-Scale Feature and Metric Learning for Relation Extraction

Existing methods in relation extraction have leveraged the lexical features in the word sequence and the syntactic features in the parse tree. Though effective, the lexical features extracted from the successive word sequence may introduce…

Computation and Language · Computer Science 2021-07-29 Mi Zhang , Tieyun Qian

Semi-Supervised Affective Meaning Lexicon Expansion Using Semantic and Distributed Word Representations

In this paper, we propose an extension to graph-based sentiment lexicon induction methods by incorporating distributed and semantic word representations in building the similarity graph to expand a three-dimensional sentiment lexicon. We…

Computation and Language · Computer Science 2017-03-30 Areej Alhothali , Jesse Hoey

Synthesize-on-Graph: Knowledgeable Synthetic Data Generation for Continue Pre-training of Large Language Models

Large Language Models (LLMs) have achieved remarkable success but remain data-inefficient, especially when learning from small, specialized corpora with limited and proprietary data. Existing synthetic data generation methods for continue…

Computation and Language · Computer Science 2025-09-16 Shengjie Ma , Xuhui Jiang , Chengjin Xu , Cehao Yang , Liyu Zhang , Jian Guo

Morphology Without Borders: Clause-Level Morphology

Morphological tasks use large multi-lingual datasets that organize words into inflection tables, which then serve as training and evaluation data for various tasks. However, a closer inspection of these data reveals profound…

Computation and Language · Computer Science 2022-10-20 Omer Goldman , Reut Tsarfaty

Multilingual Word Embeddings using Multigraphs

We present a family of neural-network--inspired models for computing continuous word representations, specifically designed to exploit both monolingual and multilingual text. This framework allows us to perform unsupervised training of…

Computation and Language · Computer Science 2016-12-15 Radu Soricut , Nan Ding

A Flexible Generative Framework for Graph-based Semi-supervised Learning

We consider a family of problems that are concerned about making predictions for the majority of unlabeled, graph-structured data samples based on a small proportion of labeled samples. Relational information among the data samples, often…

Machine Learning · Computer Science 2019-11-05 Jiaqi Ma , Weijing Tang , Ji Zhu , Qiaozhu Mei

Planning with Logical Graph-based Language Model for Instruction Generation

Despite the superior performance of large language models to generate natural language texts, it is hard to generate texts with correct logic according to a given task, due to the difficulties for neural models to capture implied rules from…

Computation and Language · Computer Science 2024-07-08 Fan Zhang , Kebing Jin , Hankz Hankui Zhuo

Learning Language from a Large (Unannotated) Corpus

A novel approach to the fully automated, unsupervised extraction of dependency grammars and associated syntax-to-semantic-relationship mappings from large text corpora is described. The suggested approach builds on the authors' prior work…

Computation and Language · Computer Science 2014-01-16 Linas Vepstas , Ben Goertzel

Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data

Despite recent advances in training and prompting strategies for Large Language Models (LLMs), these models continue to face challenges with complex logical reasoning tasks that involve long reasoning chains. In this work, we explore the…

Computation and Language · Computer Science 2024-12-18 Jiaming Zhou , Abbas Ghaddar , Ge Zhang , Liheng Ma , Yaochen Hu , Soumyasundar Pal , Mark Coates , Bin Wang , Yingxue Zhang , Jianye Hao

Joint Semantic Synthesis and Morphological Analysis of the Derived Word

Much like sentences are composed of words, words themselves are composed of smaller units. For example, the English word questionably can be analyzed as question+able+ly. However, this structural decomposition of the word does not directly…

Computation and Language · Computer Science 2018-11-13 Ryan Cotterell , Hinrich Schütze

Bilingual Lexicon Induction for Low-Resource Languages using Graph Matching via Optimal Transport

Bilingual lexicons form a critical component of various natural language processing applications, including unsupervised and semisupervised machine translation and crosslingual information retrieval. We improve bilingual lexicon induction…

Computation and Language · Computer Science 2022-10-27 Kelly Marchisio , Ali Saad-Eldin , Kevin Duh , Carey Priebe , Philipp Koehn