Related papers: Open-Set Language Identification

An Open Dataset and Model for Language Identification

Language identification (LID) is a fundamental step in many natural language processing pipelines. However, current LID systems are far from perfect, particularly on lower-resource languages. We present a LID model which achieves a…

Computation and Language · Computer Science 2023-08-31 Laurie Burchell , Alexandra Birch , Nikolay Bogoychev , Kenneth Heafield

Robust Open-Set Spoken Language Identification and the CU MultiLang Dataset

Most state-of-the-art spoken language identification models are closed-set; in other words, they can only output a language label from the set of classes they were trained on. Open-set spoken language identification systems, however, gain…

Computation and Language · Computer Science 2023-08-30 Mustafa Eyceoz , Justin Lee , Siddharth Pittie , Homayoon Beigi

Modernizing Open-Set Speech Language Identification

While most modern speech Language Identification methods are closed-set, we want to see if they can be modified and adapted for the open-set problem. When switching to the open-set problem, the solution gains the ability to reject an audio…

Computation and Language · Computer Science 2022-05-24 Mustafa Eyceoz , Justin Lee , Homayoon Beigi

Language Recognition using Random Indexing

Random Indexing is a simple implementation of Random Projections with a wide range of applications. It can solve a variety of problems with good accuracy without introducing much complexity. Here we use it for identifying the language of…

Computation and Language · Computer Science 2015-03-02 Aditya Joshi , Johan Halseth , Pentti Kanerva

MOSLD-Bench: Multilingual Open-Set Learning and Discovery Benchmark for Text Categorization

Open-set learning and discovery (OSLD) is a challenging machine learning task in which samples from new (unknown) classes can appear at test time. It can be seen as a generalization of zero-shot learning, where the new classes are not known…

Computation and Language · Computer Science 2026-01-21 Adriana-Valentina Costache , Daria-Nicoleta Dragomir , Silviu-Florin Gheorghe , Eduard Poesina , Paul Irofti , Radu Tudor Ionescu

A Comparison of Methods for OOV-word Recognition on a New Public Dataset

A common problem for automatic speech recognition systems is how to recognize words that they did not see during training. Currently there is no established method of evaluating different techniques for tackling this problem. We propose…

Computation and Language · Computer Science 2021-07-20 Rudolf A. Braun , Srikanth Madikeri , Petr Motlicek

Enhancing Neural Spoken Language Recognition: An Exploration with Multilingual Datasets

In this research, we advanced a spoken language recognition system, moving beyond traditional feature vector-based models. Our improvements focused on effectively capturing language characteristics over extended periods using a specialized…

Sound · Computer Science 2025-01-22 Or Haim Anidjar , Roi Yozevitch

Localized Vision-Language Matching for Open-vocabulary Object Detection

In this work, we propose an open-vocabulary object detection method that, based on image-caption pairs, learns to detect novel object classes along with a given set of known classes. It is a two-stage training approach that first uses a…

Computer Vision and Pattern Recognition · Computer Science 2022-07-29 Maria A. Bravo , Sudhanshu Mittal , Thomas Brox

Highly Generalizable Models for Multilingual Hate Speech Detection

Hate speech detection has become an important research topic within the past decade. More private corporations are needing to regulate user generated content on different platforms across the globe. In this paper, we introduce a study of…

Computation and Language · Computer Science 2022-01-28 Neha Deshpande , Nicholas Farris , Vidhur Kumar

Open-Set Recognition Using Intra-Class Splitting

This paper proposes a method to use deep neural networks as end-to-end open-set classifiers. It is based on intra-class data splitting. In open-set recognition, only samples from a limited number of known classes are available for training.…

Machine Learning · Computer Science 2019-11-21 Patrick Schlachter , Yiwen Liao , Bin Yang

Deep Learning Models for Multilingual Hate Speech Detection

Hate speech detection is a challenging problem with most of the datasets available in only one language: English. In this paper, we conduct a large scale analysis of multilingual hate speech in 9 languages from 16 different sources. We…

Social and Information Networks · Computer Science 2020-12-10 Sai Saketh Aluru , Binny Mathew , Punyajoy Saha , Animesh Mukherjee

Spoken Language Identification using ConvNets

Language Identification (LI) is an important first step in several speech processing systems. With a growing number of voice-based assistants, speech LI has emerged as a widely researched field. To approach the problem of identifying…

Computation and Language · Computer Science 2019-10-11 Sarthak , Shikhar Shukla , Govind Mittal

Few-Shot Keyword Spotting in Any Language

We introduce a few-shot transfer learning method for keyword spotting in any language. Leveraging open speech corpora in nine languages, we automate the extraction of a large multilingual keyword bank and use it to train an embedding model.…

Computation and Language · Computer Science 2021-09-13 Mark Mazumder , Colby Banbury , Josh Meyer , Pete Warden , Vijay Janapa Reddi

Native Language Identification using i-vector

The task of determining a speaker's native language based only on his speeches in a second language is known as Native Language Identification or NLI. Due to its increasing applications in various domains of speech signal processing, this…

Computation and Language · Computer Science 2018-11-15 Ahmed Nazim Uddin , Md Ashequr Rahman , Md. Rafidul Islam , Mohammad Ariful Haque

Know Yourself Better: Diverse Object-Related Features Improve Open Set Recognition

Open set recognition (OSR) is a critical aspect of machine learning, addressing the challenge of detecting novel classes during inference. Within the realm of deep learning, neural classifiers trained on a closed set of data typically…

Computer Vision and Pattern Recognition · Computer Science 2026-05-05 Jiawen Xu , Margret Keuper

Improving Spoken Language Identification with Map-Mix

The pre-trained multi-lingual XLSR model generalizes well for language identification after fine-tuning on unseen languages. However, the performance significantly degrades when the languages are not very distinct from each other, for…

Machine Learning · Computer Science 2023-02-17 Shangeth Rajaa , Kriti Anandan , Swaraj Dalmia , Tarun Gupta , Eng Siong Chng

Meta Learning for Few-Shot One-class Classification

We propose a method that can perform one-class classification given only a small number of examples from the target class and none from the others. We formulate the learning of meaningful features for one-class classification as a…

Computer Vision and Pattern Recognition · Computer Science 2021-04-26 Gabriel Dahia , Maurício Pamplona Segundo

One-Class Feature Learning Using Intra-Class Splitting

This paper proposes a novel generic one-class feature learning method based on intra-class splitting. In one-class classification, feature learning is challenging, because only samples of one class are available during training. Hence,…

Machine Learning · Computer Science 2019-11-21 Patrick Schlachter , Yiwen Liao , Bin Yang

One-Class Meta-Learning: Towards Generalizable Few-Shot Open-Set Classification

Real-world classification tasks are frequently required to work in an open-set setting. This is especially challenging for few-shot learning problems due to the small sample size for each known category, which prevents existing open-set…

Computer Vision and Pattern Recognition · Computer Science 2021-09-15 Jedrzej Kozerawski , Matthew Turk

Cross-Lingual Induction and Transfer of Verb Classes Based on Word Vector Space Specialisation

Existing approaches to automatic VerbNet-style verb classification are heavily dependent on feature engineering and therefore limited to languages with mature NLP pipelines. In this work, we propose a novel cross-lingual transfer method for…

Computation and Language · Computer Science 2017-07-24 Ivan Vulić , Nikola Mrkšić , Anna Korhonen