English
Related papers

Related papers: German Dialect Identification Using Classifier Ens…

200 papers

This paper presents an ensemble system combining the output of multiple SVM classifiers to native language identification (NLI). The system was submitted to the NLI Shared Task 2017 fusion track which featured students essays and spoken…

Computation and Language · Computer Science 2017-07-25 Marcos Zampieri , Alina Maria Ciobanu , Liviu P. Dinu

In this paper we present ensemble-based systems for dialect and language variety identification using the datasets made available by the organizers of the VarDial Evaluation Campaign 2018. We present a system developed to discriminate…

Computation and Language · Computer Science 2018-08-15 Liviu P. Dinu , Alina Maria Ciobanu , Marcos Zampieri , Shervin Malmasi

In this work, we describe our approach addressing the Social Media Variety Geolocation task featured in the 2021 VarDial Evaluation Campaign. We focus on the second subtask, which is based on a data set formed of approximately 30 thousand…

Computation and Language · Computer Science 2021-03-02 Mihaela Gaman , Sebastian Cojocariu , Radu Tudor Ionescu

This article describes an unsupervised language model adaptation approach that can be used to enhance the performance of language identification methods. The approach is applied to a current version of the HeLI language identification…

Computation and Language · Computer Science 2019-03-27 Tommi Jauhiainen , Krister Lindén , Heidi Jauhiainen

This paper describes the winning approach in the Shared Task 3 at SwissText 2021 on Swiss German Speech to Standard German Text, a public competition on dialect recognition and translation. Swiss German refers to the multitude of Alemannic…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-02 Yuriy Arabskyy , Aashish Agarwal , Subhadeep Dey , Oscar Koller

In this work, we introduce the methods proposed by the UnibucKernel team in solving the Social Media Variety Geolocation task featured in the 2020 VarDial Evaluation Campaign. We address only the second subtask, which targets a data set…

Computation and Language · Computer Science 2020-10-09 Mihaela Gaman , Radu Tudor Ionescu

This paper presents our approach for SwissText & KONVENS 2020 shared task 2, which is a multi-stage neural model for Swiss German (GSW) identification on Twitter. Our model outputs either GSW or non-GSW and is not meant to be used as a…

Computation and Language · Computer Science 2020-06-08 Mohammadreza Banaei , Rémi Lebret , Karl Aberer

This paper describes the submissions by team HWR to the Dravidian Language Identification (DLI) shared task organized at VarDial 2021 workshop. The DLI training set includes 16,674 YouTube comments written in Roman script containing…

Computation and Language · Computer Science 2021-03-10 Tommi Jauhiainen , Tharindu Ranasinghe , Marcos Zampieri

In this paper, we discuss the issues in automatic recognition of vowels in Persian language. The present work focuses on new statistical method of recognition of vowels as a basic unit of syllables. First we describe a vowel detection…

Multimedia · Computer Science 2008-12-15 Mohammad Nazari , Abolghasem Sayadiyan , SeyedMajid Valiollahzadeh

In this paper we present a system based on SVM ensembles trained on characters and words to discriminate between five similar languages of the Indo-Aryan family: Hindi, Braj Bhasha, Awadhi, Bhojpuri, and Magahi. We investigate the…

Computation and Language · Computer Science 2018-07-10 Alina Maria Ciobanu , Marcos Zampieri , Shervin Malmasi , Santanu Pal , Liviu P. Dinu

Dialect Identification is a crucial task for localizing various Large Language Models. This paper outlines our approach to the VarDial 2023 shared task. Here we have to identify three or two dialects from three languages each which results…

Computation and Language · Computer Science 2023-03-29 Ankit Vaidya , Aditya Kane

We present the results and findings of the 2nd Swiss German speech to Standard German text shared task at SwissText 2022. Participants were asked to build a sentence-level Swiss German speech to Standard German text system specialized on…

Computation and Language · Computer Science 2023-01-18 Michel Plüss , Yanick Schraner , Christian Scheller , Manfred Vogel

Ensemble methods using multiple classifiers have proven to be the most successful approach for the task of Native Language Identification (NLI), achieving the current state of the art. However, a systematic examination of ensemble methods…

Computation and Language · Computer Science 2017-03-21 Shervin Malmasi , Mark Dras

This paper addresses spoken language identification (SLI) and speech recognition of multilingual broadcast and institutional speech, real application scenarios that have been rarely addressed in the SLI literature. Observing that in these…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-14 Martina Valente , Fabio Brugnara , Giovanni Morrone , Enrico Zovato , Leonardo Badino

We present SDS-200, a corpus of Swiss German dialectal speech with Standard German text translations, annotated with dialect, age, and gender information of the speakers. The dataset allows for training speech translation, dialect…

We introduce a neural network-based system of Word Sense Disambiguation (WSD) for German that is based on SenseFitting, a novel method for optimizing WSD. We outperform knowledge-based WSD methods by up to 25% F1-score and produce a new…

Computation and Language · Computer Science 2019-08-01 Manuel Stoeckel , Sajawel Ahmed , Alexander Mehler

We present a machine learning approach that ranked on the first place in the Arabic Dialect Identification (ADI) Closed Shared Tasks of the 2018 VarDial Evaluation Campaign. The proposed approach combines several kernels using multiple…

Computation and Language · Computer Science 2018-07-31 Andrei M. Butnaru , Radu Tudor Ionescu

This paper describes Galileo's performance in SemEval-2020 Task 12 on detecting and categorizing offensive language in social media. For Offensive Language Identification, we proposed a multi-lingual method using Pre-trained Language…

Computation and Language · Computer Science 2020-10-08 Shuohuan Wang , Jiaxiang Liu , Xuan Ouyang , Yu Sun

This paper presents the contribution of our dzNLP team to the NADI 2024 shared task, specifically in Subtask 1 - Multi-label Country-level Dialect Identification (MLDID) (Closed Track). We explored various configurations to address the…

Computation and Language · Computer Science 2024-07-19 Mohamed Lichouri , Khaled Lounnas , Boualem Nadjib Zahaf , Mehdi Ayoub Rabiai

This report presents the results of the shared tasks organized as part of the VarDial Evaluation Campaign 2023. The campaign is part of the tenth workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects…

‹ Prev 1 2 3 10 Next ›