Related papers: Language Recognition using Random Indexing

Language Identification with a Reciprocal Rank Classifier

Language identification is a critical component of language processing pipelines (Jauhiainen et al.,2019) and is not a solved problem in real-world settings. We present a lightweight and effective language identifier that is robust to…

Computation and Language · Computer Science 2021-09-22 Dominic Widdows , Chris Brew

Neural Random Projections for Language Modelling

Neural network-based language models deal with data sparsity problems by mapping the large discrete space of words into a smaller continuous space of real-valued vectors. By learning distributed vector representations for words, each…

Computation and Language · Computer Science 2018-09-27 Davide Nunes , Luis Antunes

Text segmentation with character-level text embeddings

Learning word representations has recently seen much success in computational linguistics. However, assuming sequences of word tokens as input to linguistic analysis is often unjustified. For many languages word segmentation is a…

Computation and Language · Computer Science 2013-09-19 Grzegorz Chrupała

Deriving Word Vectors from Contextualized Language Models using Topic-Aware Mention Selection

One of the long-standing challenges in lexical semantics consists in learning representations of words which reflect their semantic properties. The remarkable success of word embeddings for this purpose suggests that high-quality…

Computation and Language · Computer Science 2021-06-16 Yixiao Wang , Zied Bouraoui , Luis Espinosa Anke , Steven Schockaert

Rare Word Recognition and Translation Without Fine-Tuning via Task Vector in Speech Models

Rare words remain a critical bottleneck for speech-to-text systems. While direct fine-tuning improves recognition of target words, it often incurs high cost, catastrophic forgetting, and limited scalability. To address these challenges, we…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-29 Ruihao Jing , Cheng Gong , Yu Jiang , Boyu Zhu , Shansong Liu , Chi Zhang , Xiao-Lei Zhang , Xuelong Li

Spoken Language Identification using ConvNets

Language Identification (LI) is an important first step in several speech processing systems. With a growing number of voice-based assistants, speech LI has emerged as a widely researched field. To approach the problem of identifying…

Computation and Language · Computer Science 2019-10-11 Sarthak , Shikhar Shukla , Govind Mittal

A reproduction of Apple's bi-directional LSTM models for language identification in short strings

Language Identification is the task of identifying a document's language. For applications like automatic spell checker selection, language identification must use very short strings such as text message fragments. In this work, we…

Computation and Language · Computer Science 2021-02-15 Mads Toftrup , Søren Asger Sørensen , Manuel R. Ciosici , Ira Assent

Dense Sparse Retrieval: Using Sparse Language Models for Inference Efficient Dense Retrieval

Vector-based retrieval systems have become a common staple for academic and industrial search applications because they provide a simple and scalable way of extending the search to leverage contextual representations for documents and…

Information Retrieval · Computer Science 2023-04-04 Daniel Campos , ChengXiang Zhai

LanideNN: Multilingual Language Identification on Character Window

In language identification, a common first step in natural language processing, we want to automatically determine the language of some input text. Monolingual language identification assumes that the given document is written in one…

Computation and Language · Computer Science 2017-08-01 Tom Kocmi , Ondřej Bojar

Learning Word Vectors for 157 Languages

Distributed word representations, or word vectors, have recently been applied to many tasks in natural language processing, leading to state-of-the-art performance. A key ingredient to the successful application of these representations is…

Computation and Language · Computer Science 2018-03-30 Edouard Grave , Piotr Bojanowski , Prakhar Gupta , Armand Joulin , Tomas Mikolov

Unwritten Languages Demand Attention Too! Word Discovery with Encoder-Decoder Models

Word discovery is the task of extracting words from unsegmented text. In this paper we examine to what extent neural networks can be applied to this task in a realistic unwritten language scenario, where only small corpora and limited…

Computation and Language · Computer Science 2017-09-20 Marcely Zanon Boito , Alexandre Berard , Aline Villavicencio , Laurent Besacier

A Simple and Efficient Probabilistic Language model for Code-Mixed Text

The conventional natural language processing approaches are not accustomed to the social media text due to colloquial discourse and non-homogeneous characteristics. Significantly, the language identification in a multilingual document is…

Computation and Language · Computer Science 2021-06-30 M Zeeshan Ansari , Tanvir Ahmad , M M Sufyan Beg , Asma Ikram

Efficient Estimation of Word Representations in Vector Space

We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the…

Computation and Language · Computer Science 2013-09-10 Tomas Mikolov , Kai Chen , Greg Corrado , Jeffrey Dean

Using Images to Find Context-Independent Word Representations in Vector Space

Many methods have been proposed to find vector representation for words, but most rely on capturing context from the text to find semantic relationships between these vectors. We propose a novel method of using dictionary meanings and image…

Computation and Language · Computer Science 2024-12-06 Harsh Kumar

Object Recognition as Next Token Prediction

We present an approach to pose object recognition as next token prediction. The idea is to apply a language decoder that auto-regressively predicts the text tokens from image embeddings to form labels. To ground this prediction process in…

Computer Vision and Pattern Recognition · Computer Science 2024-04-02 Kaiyu Yue , Bor-Chun Chen , Jonas Geiping , Hengduo Li , Tom Goldstein , Ser-Nam Lim

A Resource-Light Method for Cross-Lingual Semantic Textual Similarity

Recognizing semantically similar sentences or paragraphs across languages is beneficial for many tasks, ranging from cross-lingual information retrieval and plagiarism detection to machine translation. Recently proposed methods for…

Computation and Language · Computer Science 2018-01-22 Goran Glavaš , Marc Franco-Salvador , Simone Paolo Ponzetto , Paolo Rosso

Speech-Driven Text Retrieval: Using Target IR Collections for Statistical Language Model Adaptation in Speech Recognition

Speech recognition has of late become a practical technology for real world applications. Aiming at speech-driven text retrieval, which facilitates retrieving information with spoken queries, we propose a method to integrate speech…

Computation and Language · Computer Science 2007-05-23 Atsushi Fujii , Katunobu Itou , Tetsuya Ishikawa

Probabilistic Random Indexing for Continuous Event Detection

The present paper explores a novel variant of Random Indexing (RI) based representations for encoding language data with a view to using them in a dynamic scenario where events are happening in a continuous fashion. As the size of the…

Machine Learning · Computer Science 2021-12-10 Yashank Singh , Niladri Chatterjee

Open-Set Language Identification

We present the first open-set language identification experiments using one-class classification. We first highlight the shortcomings of traditional feature extraction methods and propose a hashing-based feature vectorization approach as a…

Computation and Language · Computer Science 2017-07-18 Shervin Malmasi

Detecting Subtle Differences between Human and Model Languages Using Spectrum of Relative Likelihood

Human and model-generated texts can be distinguished by examining the magnitude of likelihood in language. However, it is becoming increasingly difficult as language model's capabilities of generating human-like texts keep evolving. This…

Computation and Language · Computer Science 2024-10-10 Yang Xu , Yu Wang , Hao An , Zhichen Liu , Yongyuan Li