Related papers: Summarizing Indian Languages using Multilingual Tr…

Low Resource Summarization using Pre-trained Language Models

With the advent of Deep Learning based Artificial Neural Networks models, Natural Language Processing (NLP) has witnessed significant improvements in textual data processing in terms of its efficiency and accuracy. However, the research is…

Computation and Language · Computer Science 2023-10-05 Mubashir Munaf , Hammad Afzal , Naima Iltaf , Khawir Mahmood

L3Cube-MahaSum: A Comprehensive Dataset and BART Models for Abstractive Text Summarization in Marathi

We present the MahaSUM dataset, a large-scale collection of diverse news articles in Marathi, designed to facilitate the training and evaluation of models for abstractive summarization tasks in Indic languages. The dataset, containing 25k…

Computation and Language · Computer Science 2024-10-15 Pranita Deshmukh , Nikita Kulkarni , Sanhita Kulkarni , Kareena Manghani , Raviraj Joshi

IndicBART: A Pre-trained Model for Indic Natural Language Generation

In this paper, we study pre-trained sequence-to-sequence models for a group of related languages, with a focus on Indic languages. We present IndicBART, a multilingual, sequence-to-sequence pre-trained model focusing on 11 Indic languages…

Computation and Language · Computer Science 2022-10-28 Raj Dabre , Himani Shrotriya , Anoop Kunchukuttan , Ratish Puduppully , Mitesh M. Khapra , Pratyush Kumar

Implementing Deep Learning-Based Approaches for Article Summarization in Indian Languages

The research on text summarization for low-resource Indian languages has been limited due to the availability of relevant datasets. This paper presents a summary of various deep-learning approaches used for the ILSUM 2022 Indic language…

Computation and Language · Computer Science 2022-12-13 Rahul Tangsali , Aabha Pingle , Aditya Vyawahare , Isha Joshi , Raviraj Joshi

Evaluating LLMs and Pre-trained Models for Text Summarization Across Diverse Datasets

Text summarization plays a crucial role in natural language processing by condensing large volumes of text into concise and coherent summaries. As digital content continues to grow rapidly and the demand for effective information retrieval…

Computation and Language · Computer Science 2025-03-14 Tohida Rehman , Soumabha Ghosh , Kuntal Das , Souvik Bhattacharjee , Debarshi Kumar Sanyal , Samiran Chattopadhyay

IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation metrics for Indian Languages

The rapid growth of machine translation (MT) systems has necessitated comprehensive studies to meta-evaluate evaluation metrics being used, which enables a better selection of metrics that best reflect MT quality. Unfortunately, most of the…

Computation and Language · Computer Science 2023-07-04 Ananya B. Sai , Vignesh Nagarajan , Tanay Dixit , Raj Dabre , Anoop Kunchukuttan , Pratyush Kumar , Mitesh M. Khapra

Transformer Models in Education: Summarizing Science Textbooks with AraBART, MT5, AraT5, and mBART

Recently, with the rapid development in the fields of technology and the increasing amount of text t available on the internet, it has become urgent to develop effective tools for processing and understanding texts in a way that summaries…

Computation and Language · Computer Science 2024-06-13 Sari Masri , Yaqeen Raddad , Fidaa Khandaqji , Huthaifa I. Ashqar , Mohammed Elhenawy

Indowordnets help in Indian Language Machine Translation

Being less resource languages, Indian-Indian and English-Indian language MT system developments faces the difficulty to translate various lexical phenomena. In this paper, we present our work on a comparative study of 440 phrase-based…

Computation and Language · Computer Science 2017-10-09 Sreelekha S , Pushpak Bhattacharyya

Comparing Approaches to Automatic Summarization in Less-Resourced Languages

Automatic text summarization has achieved high performance in high-resourced languages like English, but comparatively less attention has been given to summarization in less-resourced languages. This work compares a variety of different…

Computation and Language · Computer Science 2026-01-01 Chester Palen-Michel , Constantine Lignos

Machine Translation Approaches and Survey for Indian Languages

In this study, we present an analysis regarding the performance of the state-of-art Phrase-based Statistical Machine Translation (SMT) on multiple Indian languages. We report baseline systems on several language pairs. The motivation of…

Computation and Language · Computer Science 2017-01-17 Nadeem Jadoon Khan , Waqas Anwar , Nadir Durrani

Pre-Trained Multilingual Sequence-to-Sequence Models: A Hope for Low-Resource Language Translation?

What can pre-trained multilingual sequence-to-sequence models like mBART contribute to translating low-resource languages? We conduct a thorough empirical experiment in 10 languages to ascertain this, considering five factors: (1) the…

Computation and Language · Computer Science 2022-05-03 En-Shiun Annie Lee , Sarubi Thillainathan , Shravan Nayak , Surangika Ranathunga , David Ifeoluwa Adelani , Ruisi Su , Arya D. McCarthy

Abstractive Summarization of Low resourced Nepali language using Multilingual Transformers

Automatic text summarization in Nepali language is an unexplored area in natural language processing (NLP). Although considerable research has been dedicated to extractive summarization, the area of abstractive summarization, especially for…

Computation and Language · Computer Science 2024-10-01 Prakash Dhakal , Daya Sagar Baral

Multitask Finetuning for Improving Neural Machine Translation in Indian Languages

Transformer based language models have led to impressive results across all domains in Natural Language Processing. Pretraining these models on language modeling tasks and finetuning them on downstream tasks such as Text Classification,…

Computation and Language · Computer Science 2021-12-06 Shaily Desai , Atharva Kshirsagar , Manisha Marathe

IndicIRSuite: Multilingual Dataset and Neural Information Models for Indian Languages

In this paper, we introduce Neural Information Retrieval resources for 11 widely spoken Indian Languages (Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, and Telugu) from two major Indian language…

Information Retrieval · Computer Science 2023-12-18 Saiful Haq , Ashutosh Sharma , Pushpak Bhattacharyya

Statistical Machine Translation for Indic Languages

Machine Translation (MT) system generally aims at automatic representation of source language into target language retaining the originality of context using various Natural Language Processing (NLP) techniques. Among various NLP methods,…

Computation and Language · Computer Science 2026-03-04 Sudhansu Bala Das , Divyajoti Panda , Tapas Kumar Mishra , Bidyut Kr. Patra

Indic-Transformers: An Analysis of Transformer Language Models for Indian Languages

Language models based on the Transformer architecture have achieved state-of-the-art performance on a wide range of NLP tasks such as text classification, question-answering, and token classification. However, this performance is usually…

Computation and Language · Computer Science 2020-11-05 Kushal Jain , Adwait Deshpande , Kumar Shridhar , Felix Laumann , Ayushman Dash

Unraveling the Capabilities of Language Models in News Summarization

Given the recent introduction of multiple language models and the ongoing demand for improved Natural Language Processing tasks, particularly summarization, this work provides a comprehensive benchmarking of 20 recent language models,…

Computation and Language · Computer Science 2025-01-31 Abdurrahman Odabaşı , Göksel Biricik

IndicGEC: Powerful Models, or a Measurement Mirage?

In this paper, we report the results of the TeamNRC's participation in the BHASHA-Task 1 Grammatical Error Correction shared task https://github.com/BHASHA-Workshop/IndicGEC2025/ for 5 Indian languages. Our approach, focusing on…

Computation and Language · Computer Science 2025-11-20 Sowmya Vajjala

mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences

We present our work on developing a multilingual, efficient text-to-text transformer that is suitable for handling long inputs. This model, called mLongT5, builds upon the architecture of LongT5, while leveraging the multilingual datasets…

Computation and Language · Computer Science 2023-10-30 David Uthus , Santiago Ontañón , Joshua Ainslie , Mandy Guo

ViTA: Visual-Linguistic Translation by Aligning Object Tags

Multimodal Machine Translation (MMT) enriches the source text with visual information for translation. It has gained popularity in recent years, and several pipelines have been proposed in the same direction. Yet, the task lacks quality…

Computation and Language · Computer Science 2021-06-29 Kshitij Gupta , Devansh Gautam , Radhika Mamidi