Related papers: German Dialect Identification Using Classifier Ens…

Native Language Identification on Text and Speech

This paper presents an ensemble system combining the output of multiple SVM classifiers to native language identification (NLI). The system was submitted to the NLI Shared Task 2017 fusion track which featured students essays and spoken…

Computation and Language · Computer Science 2017-07-25 Marcos Zampieri , Alina Maria Ciobanu , Liviu P. Dinu

Classifier Ensembles for Dialect and Language Variety Identification

In this paper we present ensemble-based systems for dialect and language variety identification using the datasets made available by the organizers of the VarDial Evaluation Campaign 2018. We present a system developed to discriminate…

Computation and Language · Computer Science 2018-08-15 Liviu P. Dinu , Alina Maria Ciobanu , Marcos Zampieri , Shervin Malmasi

UnibucKernel: Geolocating Swiss German Jodels Using Ensemble Learning

In this work, we describe our approach addressing the Social Media Variety Geolocation task featured in the 2021 VarDial Evaluation Campaign. We focus on the second subtask, which is based on a data set formed of approximately 30 thousand…

Computation and Language · Computer Science 2021-03-02 Mihaela Gaman , Sebastian Cojocariu , Radu Tudor Ionescu

Language Model Adaptation for Language and Dialect Identification of Text

This article describes an unsupervised language model adaptation approach that can be used to enhance the performance of language identification methods. The approach is applied to a current version of the HeLI language identification…

Computation and Language · Computer Science 2019-03-27 Tommi Jauhiainen , Krister Lindén , Heidi Jauhiainen

Dialectal Speech Recognition and Translation of Swiss German Speech to Standard German Text: Microsoft's Submission to SwissText 2021

This paper describes the winning approach in the Shared Task 3 at SwissText 2021 on Swiss German Speech to Standard German Text, a public competition on dialect recognition and translation. Swiss German refers to the multitude of Alemannic…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-02 Yuriy Arabskyy , Aashish Agarwal , Subhadeep Dey , Oscar Koller

Combining Deep Learning and String Kernels for the Localization of Swiss German Tweets

In this work, we introduce the methods proposed by the UnibucKernel team in solving the Social Media Variety Geolocation task featured in the 2020 VarDial Evaluation Campaign. We address only the second subtask, which targets a data set…

Computation and Language · Computer Science 2020-10-09 Mihaela Gaman , Radu Tudor Ionescu

Spoken dialect identification in Twitter using a multi-filter architecture

This paper presents our approach for SwissText & KONVENS 2020 shared task 2, which is a multi-stage neural model for Swiss German (GSW) identification on Twitter. Our model outputs either GSW or non-GSW and is not meant to be used as a…

Computation and Language · Computer Science 2020-06-08 Mohammadreza Banaei , Rémi Lebret , Karl Aberer

Comparing Approaches to Dravidian Language Identification

This paper describes the submissions by team HWR to the Dravidian Language Identification (DLI) shared task organized at VarDial 2021 workshop. The DLI training set includes 16,674 YouTube comments written in Roman script containing…

Computation and Language · Computer Science 2021-03-10 Tommi Jauhiainen , Tharindu Ranasinghe , Marcos Zampieri

Probabilistic SVM/GMM Classifier for Speaker-Independent Vowel Recognition in Continues Speech

In this paper, we discuss the issues in automatic recognition of vowels in Persian language. The present work focuses on new statistical method of recognition of vowels as a basic unit of syllables. First we describe a vowel detection…

Multimedia · Computer Science 2008-12-15 Mohammad Nazari , Abolghasem Sayadiyan , SeyedMajid Valiollahzadeh

Discriminating between Indo-Aryan Languages Using SVM Ensembles

In this paper we present a system based on SVM ensembles trained on characters and words to discriminate between five similar languages of the Indo-Aryan family: Hindi, Braj Bhasha, Awadhi, Bhojpuri, and Magahi. We investigate the…

Computation and Language · Computer Science 2018-07-10 Alina Maria Ciobanu , Marcos Zampieri , Shervin Malmasi , Santanu Pal , Liviu P. Dinu

Two-stage Pipeline for Multilingual Dialect Detection

Dialect Identification is a crucial task for localizing various Large Language Models. This paper outlines our approach to the VarDial 2023 shared task. Here we have to identify three or two dialects from three languages each which results…

Computation and Language · Computer Science 2023-03-29 Ankit Vaidya , Aditya Kane

2nd Swiss German Speech to Standard German Text Shared Task at SwissText 2022

We present the results and findings of the 2nd Swiss German speech to Standard German text shared task at SwissText 2022. Participants were asked to build a sentence-level Swiss German speech to Standard German text system specialized on…

Computation and Language · Computer Science 2023-01-18 Michel Plüss , Yanick Schraner , Christian Scheller , Manfred Vogel

Native Language Identification using Stacked Generalization

Ensemble methods using multiple classifiers have proven to be the most successful approach for the task of Native Language Identification (NLI), achieving the current state of the art. However, a systematic examination of ensemble methods…

Computation and Language · Computer Science 2017-03-21 Shervin Malmasi , Mark Dras

Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech

This paper addresses spoken language identification (SLI) and speech recognition of multilingual broadcast and institutional speech, real application scenarios that have been rarely addressed in the SLI literature. Observing that in these…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-14 Martina Valente , Fabio Brugnara , Giovanni Morrone , Enrico Zovato , Leonardo Badino

SDS-200: A Swiss German Speech to Standard German Text Corpus

We present SDS-200, a corpus of Swiss German dialectal speech with Standard German text translations, annotated with dialect, age, and gender information of the speakers. The dataset allows for training speech translation, dialect…

Computation and Language · Computer Science 2022-05-20 Michel Plüss , Manuela Hürlimann , Marc Cuny , Alla Stöckli , Nikolaos Kapotis , Julia Hartmann , Malgorzata Anna Ulasik , Christian Scheller , Yanick Schraner , Amit Jain , Jan Deriu , Mark Cieliebak , Manfred Vogel

SenseFitting: Sense Level Semantic Specialization of Word Embeddings for Word Sense Disambiguation

We introduce a neural network-based system of Word Sense Disambiguation (WSD) for German that is based on SenseFitting, a novel method for optimizing WSD. We outperform knowledge-based WSD methods by up to 25% F1-score and produce a new…

Computation and Language · Computer Science 2019-08-01 Manuel Stoeckel , Sajawel Ahmed , Alexander Mehler

UnibucKernel Reloaded: First Place in Arabic Dialect Identification for the Second Year in a Row

We present a machine learning approach that ranked on the first place in the Arabic Dialect Identification (ADI) Closed Shared Tasks of the 2018 VarDial Evaluation Campaign. The proposed approach combines several kernels using multiple…

Computation and Language · Computer Science 2018-07-31 Andrei M. Butnaru , Radu Tudor Ionescu

Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive Language Identification using Pre-trained Language Models

This paper describes Galileo's performance in SemEval-2020 Task 12 on detecting and categorizing offensive language in social media. For Offensive Language Identification, we proposed a multi-lingual method using Pre-trained Language…

Computation and Language · Computer Science 2020-10-08 Shuohuan Wang , Jiaxiang Liu , Xuan Ouyang , Yu Sun

dzNLP at NADI 2024 Shared Task: Multi-Classifier Ensemble with Weighted Voting and TF-IDF Features

This paper presents the contribution of our dzNLP team to the NADI 2024 shared task, specifically in Subtask 1 - Multi-label Country-level Dialect Identification (MLDID) (Closed Track). We explored various configurations to address the…

Computation and Language · Computer Science 2024-07-19 Mohamed Lichouri , Khaled Lounnas , Boualem Nadjib Zahaf , Mehdi Ayoub Rabiai

Findings of the VarDial Evaluation Campaign 2023

This report presents the results of the shared tasks organized as part of the VarDial Evaluation Campaign 2023. The campaign is part of the tenth workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects…

Computation and Language · Computer Science 2023-06-01 Noëmi Aepli , Çağrı Çöltekin , Rob Van Der Goot , Tommi Jauhiainen , Mourhaf Kazzaz , Nikola Ljubešić , Kai North , Barbara Plank , Yves Scherrer , Marcos Zampieri