Related papers: Very strict selectional restrictions
Previous studies investigating the syntactic abilities of deep learning models have not targeted the relationship between the strength of the grammatical generalization and the amount of evidence to which the model is exposed during…
Various concepts of grammatical compositionality arise in many theories of both natural and artificial languages, and often play a key role in accounts of the syntax-semantics interface. We propose that many instances of compositionality…
Selectional preference learning methods have usually focused on word-to-class relations, e.g., a verb selects as its subject a given nominal class. This papers extends previous statistical models to class-to-class preferences, and presents…
BERT (Bidirectional Encoder Representations from Transformers) and ALBERT (A Lite BERT) are methods for pre-training language models which can later be fine-tuned for a variety of Natural Language Understanding tasks. These methods have…
The notion of appropriate sequence as introduced by Z. Harris provides a powerful syntactic way of analysing the detailed meaning of various sentences, including ambiguous ones. In an adjectival sentence like 'The leather was yellow', the…
We present a factorized compositional distributional semantics model for the representation of transitive verb constructions. Our model first produces (subject, verb) and (verb, object) vector representations based on the similarity of the…
Automatic email categorization is an important application of text classification. We study the automatic reply of email business messages in Brazilian Portuguese. We present a novel corpus containing messages from a real application, and…
This article describes the design of a common syntactic description for the core grammar of a group of related dialects. The common description does not rely on an abstract sub-linguistic structure like a metagrammar: it consists in a…
In this article, we describe some discursive segmentation methods as well as a preliminary evaluation of the segmentation quality. Although our experiment were carried for documents in French, we have developed three discursive segmentation…
In this paper, we investigate the use of selectional restriction -- the constraints a predicate imposes on its arguments -- in a language model for speech recognition. We use an un-tagged corpus, followed by a public domain tagger and a…
Word embeddings have been found to provide meaningful representations for words in an efficient way; therefore, they have become common in Natural Language Processing sys- tems. In this paper, we evaluated different word embedding models…
Existing syntactic grammars of natural languages, even with a far from complete coverage, are complex objects. Assessments of the quality of parts of such grammars are useful for the validation of their construction. We evaluated the…
Humans can learn a new word and infer its grammatical properties from very few examples. They have an abstract notion of linguistic properties like grammatical gender and agreement rules that can be applied to novel syntactic contexts and…
Semantic differentiation of nominal pluralization is grammaticalized in many languages. For example, plural markers may only be relevant for human nouns. English does not appear to make such distinctions. Using distributional semantics, we…
Sadrzadeh et al (2013) present a compositional distributional analysis of relative clauses in English in terms of the Frobenius algebraic structure of finite dimensional vector spaces. The analysis relies on distinct type assignments and…
We describe a new sense-tagged corpus for word sense disambiguation. The corpus is constituted of instances of 20 French polysemous verbs. Each verb instance is annotated with three sense labels: (1) the actual translation of the verb in…
With the increased interest in machine learning, and deep learning in particular, the use of automatic differentiation has become more wide-spread in computation. There have been two recent developments to provide the theoretical support…
In this paper we compare two competing approaches to part-of-speech tagging, statistical and constraint-based disambiguation, using French as our test language. We imposed a time limit on our experiment: the amount of time spent on the…
Rhetoric, both spoken and written, involves not only content but also style. One common stylistic tool is $\textit{parallelism}$: the juxtaposition of phrases which have the same sequence of linguistic ($\textit{e.g.}$, phonological,…
Word embeddings are numerical vectors which can represent words or concepts in a low-dimensional continuous space. These vectors are able to capture useful syntactic and semantic information. The traditional approaches like Word2Vec, GloVe…