Related papers: Bootstrapping Lexical Choice via Multiple-Sequence…
This paper describes an approach to the automatic identification of lexical information in on-line dictionaries. This approach uses bootstrapping techniques, specifically so that ambiguity in the dictionary text can be treated properly.…
We address the text-to-text generation problem of sentence-level paraphrasing -- a phenomenon distinct from and more difficult than word- or phrase-level paraphrasing. Our approach applies multiple-sequence alignment to sentences gathered…
The paper describes a system that uses large language model (LLM) technology to support the automatic learning of new entries in an intelligent agent's semantic lexicon. The process is bootstrapped by an existing non-toy lexicon and a…
Parallel texts are a relatively rare language resource, however, they constitute a very useful research material with a wide range of applications. This study presents and analyses new methodologies we developed for obtaining such data from…
Contrastive learning has been successfully used for retrieval of semantically aligned sentences, but it often requires large batch sizes or careful engineering to work well. In this paper, we instead propose a generative model for learning…
In what ways might statistical signals in linguistic input assist with the acquisition of syntax? Here we hypothesize a mechanism called collocational bootstrapping, in which regularities in word co-occurrence patterns can provide cues to…
It is introduced the using of generation lexicographical procedure for multicriteria decision-making problems.
Most existing text generation models follow the sequence-to-sequence paradigm. Generative Grammar suggests that humans generate natural language texts by learning language grammar. We propose a syntax-guided generation schema, which…
Decomposing models into multiple components is critically important in many applications such as language modeling (LM) as it enables adapting individual components separately and biasing of some components to the user's personal…
Recent advances in retrieval-augmented generation (RAG) furnish large language models (LLMs) with iterative retrievals of relevant information to handle complex multi-hop questions. These methods typically alternate between LLM reasoning…
This paper introduces a new type of unsupervised learning algorithm, based on the alignment of sentences and Harris's (1951) notion of interchangeability. The algorithm is applied to an untagged, unstructured corpus of natural language…
This paper describes a new system for semi-automatically building, extending and managing a terminological thesaurus---a multilingual terminology dictionary enriched with relationships between the terms themselves to form a thesaurus. The…
We propose a range of deep lexical acquisition methods which make use of morphological, syntactic and ontological language resources to model word similarity and bootstrap from a seed lexicon. The different methods are deployed in learning…
A growing body of work studies how to answer a question or verify a claim by generating a natural language "proof": a chain of deductive inferences yielding the answer based on a set of premises. However, these methods can only make sound…
This paper explores the automatic construction of a multilingual Lexical Knowledge Base from pre-existing lexical resources. We present a new and robust approach for linking already existing lexical/semantic hierarchies. We used a…
This thesis introduces a new unsupervised learning framework, called Alignment-Based Learning, which is based on the alignment of sentences and Harris's (1951) notion of substitutability. Instances of the framework can be applied to an…
Natural Language Generation systems typically have two parts - strategic ('what to say') and tactical ('how to say'). We present our experiments in building an unsupervised corpus-driven template based tactical NLG system. We consider…
It is very costly to build up lexical resources and domain ontologies. Especially when confronted with a new application domain lexical gaps and a poor coverage of domain concepts are a problem for the successful exploitation of natural…
Speech recognition systems for irregularly-spelled languages like English normally require hand-written pronunciations. In this paper, we describe a system for automatically obtaining pronunciations of words for which pronunciations are not…
Paraphrasing is the task of re-writing an input text using other words, without altering the meaning of the original content. Conversational systems can exploit automatic paraphrasing to make the conversation more natural, e.g., talking…