Related papers: Identifying missing dictionary entries with freque…
The eventual goal of a language model is to accurately predict the value of a missing word given its context. We present an approach to word prediction that is based on learning a representation for each word as a function of words and…
Text classification is one of the most widely studied tasks in natural language processing. Motivated by the principle of compositionality, large multilayer neural network models have been employed for this task in an attempt to effectively…
Techniques in which words are represented as vectors have proved useful in many applications in computational linguistics, however there is currently no general semantic formalism for representing meaning in terms of vectors. We present a…
The search of information in large text repositories has been plagued by the so-called document-query vocabulary gap, i.e. the semantic discordance between the contents in the stored document entities on the one hand and the human query on…
Techniques in which words are represented as vectors have proved useful in many applications in computational linguistics, however there is currently no general semantic formalism for representing meaning in terms of vectors. We present a…
This paper presents a procedure to retrieve subsets of relevant documents from large text collections for Content Analysis, e.g. in social sciences. Document retrieval for this purpose needs to take account of the fact that analysts often…
Defining words in a textual context is a useful task both for practical purposes and for gaining insight into distributed word representations. Building on the distributional hypothesis, we argue here that the most natural formalization of…
When reading a text, it is common to become stuck on unfamiliar words and phrases, such as polysemous words with novel senses, rarely used idioms, internet slang, or emerging entities. If we humans cannot figure out the meaning of those…
Term frequency is a common method for identifying the importance of a term in a query or document. But it is a weak signal, especially when the frequency distribution is flat, such as in long queries or short documents where the text is of…
Word embeddings are a powerful approach for capturing semantic similarity among terms in a vocabulary. In this paper, we develop exponential family embeddings, a class of methods that extends the idea of word embeddings to other types of…
Information retrieval is an important application area of natural-language processing where one encounters the genuine challenge of processing large quantities of unrestricted natural-language text. This paper reports on the application of…
Sequential modelling entails making sense of sequential data, which naturally occurs in a wide array of domains. One example is systems that interact with users, log user actions and behaviour, and make recommendations of items of potential…
Search engines rely heavily on term-based approaches that represent queries and documents as bags of words. Text---a document or a query---is represented by a bag of its words that ignores grammar and word order, but retains word frequency…
The neural architectures of language models are becoming increasingly complex, especially that of Transformers, based on the attention mechanism. Although their application to numerous natural language processing tasks has proven to be very…
Most representation learning algorithms for language and image processing are local, in that they identify features for a data point based on surrounding points. Yet in language processing, the correct meaning of a word often depends on its…
In this introductory article we present the basics of an approach to implementing computational interpreting of natural language aiming to model the meanings of words and phrases. Unlike other approaches, we attempt to define the meanings…
To make sense of massive data, we often fit simplified models and then interpret the parameters; for example, we cluster the text embeddings and then interpret the mean parameters of each cluster. However, these parameters are often…
This research introduces a novel psychometric method for analyzing textual data using large language models. By leveraging contextual embeddings to create contextual scores, we transform textual data into response data suitable for…
Referring expressions are natural language descriptions that identify a particular object within a scene and are widely used in our daily conversations. In this work, we focus on segmenting the object in an image specified by a referring…
Text classification helps analyse texts for semantic meaning and relevance, by mapping the words against this hierarchy. An analysis of various types of texts is invaluable to understanding both their semantic meaning, as well as their…