Ruvan Weerasinghe

Script Sensitivity: Benchmarking Language Models on Unicode, Romanized and Mixed-Script Sinhala

The performance of Language Models (LMs) on low-resource, morphologically rich languages like Sinhala remains largely unexplored, particularly regarding script variation in digital communication. Sinhala exhibits script duality, with…

Computation and Language · Computer Science 2026-05-11 Minuri Rajapakse , Ruvan Weerasinghe

Swa-bhasha Resource Hub: Romanized Sinhala to Sinhala Transliteration Systems and Data Resources

The Swa-bhasha Resource Hub provides a comprehensive collection of data resources and algorithms developed for Romanized Sinhala to Sinhala transliteration between 2020 and 2025. These resources have played a significant role in advancing…

Computation and Language · Computer Science 2026-04-28 Deshan Sumanathilaka , Sameera Perera , Sachithya Dharmasiri , Maneesha Athukorala , Anuja Dilrukshi Herath , Rukshan Dias , Pasindu Gamage , Ruvan Weerasinghe , Y. H. P. P. Priyadarshana

SinFoS: A Parallel Dataset for Translating Sinhala Figures of Speech

Figures of Speech (FoS) consist of multi-word phrases that are deeply intertwined with culture. While Neural Machine Translation (NMT) performs relatively well with the figurative expressions of high-resource languages, it often faces…

Computation and Language · Computer Science 2026-02-11 Johan Sofalas , Dilushri Pavithra , Nevidu Jayatilleke , Ruvan Weerasinghe

SinhalaMMLU: A Comprehensive Benchmark for Evaluating Multitask Language Understanding in Sinhala

Large Language Models (LLMs) demonstrate impressive general knowledge and reasoning abilities, yet their evaluation has predominantly focused on global or anglocentric subjects, often neglecting low-resource languages and culturally…

Computation and Language · Computer Science 2025-09-04 Ashmari Pramodya , Nirasha Nelki , Heshan Shalinda , Chamila Liyanage , Yusuke Sakai , Randil Pushpananda , Ruvan Weerasinghe , Hidetaka Kamigaito , Taro Watanabe

A Hybrid Architecture with Efficient Fine Tuning for Abstractive Patent Document Summarization

Automatic patent summarization approaches that help in the patent analysis and comprehension procedure are in high demand due to the colossal growth of innovations. The development of natural language processing (NLP), text mining, and deep…

Computation and Language · Computer Science 2025-06-17 Nevidu Jayatilleke , Ruvan Weerasinghe

Advancements in Natural Language Processing for Automatic Text Summarization

The substantial growth of textual content in diverse domains and platforms has led to a considerable need for Automatic Text Summarization (ATS) techniques that aid in the process of text analysis. The effectiveness of text summarization…

Computation and Language · Computer Science 2025-03-03 Nevidu Jayatilleke , Ruvan Weerasinghe , Nipuna Senanayake

IndoNLP 2025: Shared Task on Real-Time Reverse Transliteration for Romanized Indo-Aryan languages

The paper overviews the shared task on Real-Time Reverse Transliteration for Romanized Indo-Aryan languages. It focuses on the reverse transliteration of low-resourced languages in the Indo-Aryan family to their native scripts. Typing…

Computation and Language · Computer Science 2025-02-25 Deshan Sumanathilaka , Isuri Anuradha , Ruvan Weerasinghe , Nicholas Micallef , Julian Hough

CBR-RAG: Case-Based Reasoning for Retrieval Augmented Generation in LLMs for Legal Question Answering

Retrieval-Augmented Generation (RAG) enhances Large Language Model (LLM) output by providing prior knowledge as context to input. This is beneficial for knowledge-intensive and expert reliant tasks, including legal question-answering, which…

Computation and Language · Computer Science 2024-04-09 Nirmalie Wiratunga , Ramitha Abeyratne , Lasal Jayawardena , Kyle Martin , Stewart Massie , Ikechukwu Nkisi-Orji , Ruvan Weerasinghe , Anne Liret , Bruno Fleisch