English
Related papers

Related papers: Romanized to Native Malayalam Script Transliterati…

200 papers

The development of robust transliteration techniques to enhance the effectiveness of transforming Romanized scripts into native scripts is crucial for Natural Language Processing tasks, including sentiment analysis, speech recognition,…

Computation and Language · Computer Science 2025-12-01 Kanchon Gharami , Quazi Sarwar Muhtaseem , Deepti Gupta , Lavanya Elluri , Shafika Showkat Moni

End-to-end Automatic Speech Recognition (ASR) systems are rapidly claiming to become state-of-art over other modeling methods. Several techniques have been introduced to improve their ability to handle multiple languages. However, due to…

Computation and Language · Computer Science 2024-10-22 Rohit Kumar

The paper overviews the shared task on Real-Time Reverse Transliteration for Romanized Indo-Aryan languages. It focuses on the reverse transliteration of low-resourced languages in the Indo-Aryan family to their native scripts. Typing…

Computation and Language · Computer Science 2025-02-25 Deshan Sumanathilaka , Isuri Anuradha , Ruvan Weerasinghe , Nicholas Micallef , Julian Hough

Transformer-based models have recently become very popular for sequence-to-sequence applications such as machine translation and speech recognition. This work proposes a dual-decoder transformer model for low-resource multilingual speech…

Computation and Language · Computer Science 2021-09-09 Krishna D N

We study the effect of applying a language model (LM) on the output of Automatic Speech Recognition (ASR) systems for Indic languages. We fine-tune wav2vec $2.0$ models for $18$ Indic languages and adjust the results with language models…

Computation and Language · Computer Science 2022-06-16 Ankur Dhuriya , Harveen Singh Chadha , Anirudh Gupta , Priyanshi Shah , Neeraj Chhimwal , Rishabh Gaur , Vivek Raghavan

Due to reasons of convenience and lack of tech literacy, transliteration (i.e., Romanizing native scripts instead of using localization tools) is eminently prevalent in the context of low-resource languages such as Sinhala, which have their…

Computation and Language · Computer Science 2025-03-05 Yomal De Mel , Kasun Wickramasinghe , Nisansa de Silva , Surangika Ranathunga

Transliteration is a task in the domain of NLP where the output word is a similar-sounding word written using the letters of any foreign language. Today this system has been developed for several language pairs that involve English as…

Computation and Language · Computer Science 2022-08-24 Yash Raj , Bhavesh Laddagiri

We introduce romanization encoding for script-heavy languages to optimize multilingual and code-switching Automatic Speech Recognition (ASR) systems. By adopting romanization encoding alongside a balanced concatenated tokenizer within a…

Computation and Language · Computer Science 2024-12-18 Wen Ding , Fei Jia , Hainan Xu , Yu Xi , Junjie Lai , Boris Ginsburg

Large language models recall knowledge reliably in English but often fail on the same query posed in a lower-resourced language -- a crosslingual consistency gap that remains underexplored for Indian languages and their code-mixed…

Computation and Language · Computer Science 2026-05-29 Debajyoti Mazumder , Divyansh Pathak , Prashant Kodali , Aditya Joshi , Akshay Agarwal , Jasabanta Patro

Exposing latent lexical overlap, script romanization has emerged as an effective strategy for improving cross-lingual transfer (XLT) in multilingual language models (mLMs). Most prior work, however, focused on setups that favor romanization…

Computation and Language · Computer Science 2026-01-12 Benedikt Ebing , Lennart Keller , Goran Glavaš

We propose a post-OCR text correction approach for digitising texts in Romanised Sanskrit. Owing to the lack of resources our approach uses OCR models trained for other languages written in Roman. Currently, there exists no dataset…

Computation and Language · Computer Science 2018-09-10 Amrith Krishna , Bodhisattwa Prasad Majumder , Rajesh Shreedhar Bhat , Pawan Goyal

In this paper, we introduce a novel technique to recover the pen trajectory of offline characters which is a crucial step for handwritten character recognition. Generally, online acquisition approach has more advantage than its offline…

Computer Vision and Pattern Recognition · Computer Science 2018-06-05 Ayan Kumar Bhunia , Abir Bhowmick , Ankan Kumar Bhunia , Aishik Konwer , Prithaj Banerjee , Partha Pratim Roy , Umapada Pal

Large Language Models (LLMs) are increasingly deployed in high-stakes clinical applications in India. Speakers of Indian languages frequently communicate using romanized text rather than native scripts, yet existing research rarely…

Computation and Language · Computer Science 2026-04-01 Manurag Khullar , Utkarsh Desai , Poorva Malviya , Aman Dalmia , Zheyuan Ryan Shi

The success rates of Optical Character Recognition (OCR) systems for printed Malayalam documents is quite impressive with the state of the art accuracy levels in the range of 85-95% for various. However for real applications, further…

Computation and Language · Computer Science 2012-05-09 Sajilal Divakaran

Machine transliteration is the process of automatically transforming the script of a word from a source language to a target language, while preserving pronunciation. Sequence to sequence learning has recently emerged as a new paradigm in…

Computation and Language · Computer Science 2016-09-15 Amir H. Jadidinejad

As an Indo-Aryan language with limited available data, Chakma remains largely underrepresented in language models. In this work, we introduce a novel corpus of contextually coherent Bangla-transliterated Chakma, curated from Chakma…

Computation and Language · Computer Science 2025-11-27 Adity Khisa , Nusrat Jahan Lia , Tasnim Mahfuz Nafis , Zarif Masud , Tanzir Pial , Shebuti Rayana , Ahmedul Kabir

In medieval India, the Marathi language was written using the Modi script. The texts written in Modi script include extensive knowledge about medieval sciences, medicines, land records and authentic evidence about Indian history. Around 40…

Computer Vision and Pattern Recognition · Computer Science 2025-03-26 Harshal Kausadikar , Tanvi Kale , Onkar Susladkar , Sparsh Mittal

In a multilingual country like India, multilingual Automatic Speech Recognition (ASR) systems have much scope. Multilingual ASR systems exhibit many advantages like scalability, maintainability, and improved performance over the monolingual…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-01 Arunkumar A , Mudit Batra , Umesh S

India is a multi-lingual country where Roman script is often used alongside different Indic scripts in a text document. To develop a script specific handwritten Optical Character Recognition (OCR) system, it is therefore necessary to…

Machine Learning · Computer Science 2010-03-25 Ram Sarkar , Nibaran Das , Subhadip Basu , Mahantapas Kundu , Mita Nasipuri , Dipak Kumar Basu

Transformers have recently become very popular for sequence-to-sequence applications such as machine translation and speech recognition. In this work, we propose a multi-task learning-based transformer model for low-resource multilingual…

Computation and Language · Computer Science 2021-09-13 Krishna D N
‹ Prev 1 2 3 10 Next ›