Related papers: PyThaiNLP: Thai Natural Language Processing in Pyt…

mahaNLP: A Marathi Natural Language Processing Library

We present mahaNLP, an open-source natural language processing (NLP) library specifically built for the Marathi language. It aims to enhance the support for the low-resource Indian language Marathi in the field of NLP. It is an easy-to-use,…

Computation and Language · Computer Science 2023-11-07 Vidula Magdum , Omkar Dhekane , Sharayu Hiwarkhedkar , Saloni Mittal , Raviraj Joshi

EasyNLP: A Comprehensive and Easy-to-use Toolkit for Natural Language Processing

The success of Pre-Trained Models (PTMs) has reshaped the development of Natural Language Processing (NLP). Yet, it is not easy to obtain high-performing models and deploy them online for industrial practitioners. To bridge this gap,…

Computation and Language · Computer Science 2023-03-14 Chengyu Wang , Minghui Qiu , Chen Shi , Taolin Zhang , Tingting Liu , Lei Li , Jianing Wang , Ming Wang , Jun Huang , Wei Lin

TajikNLP: An Open-Source Toolkit for Comprehensive Text Processing of Tajik (Cyrillic Script)

The Tajik language, written in Cyrillic script, remains severely under-resourced in terms of publicly available natural language processing (NLP) toolkits, hindering both linguistic research and applied development. This paper introduces…

Computation and Language · Computer Science 2026-05-29 Mullosharaf K. Arabov

PyTAIL: Interactive and Incremental Learning of NLP Models with Human in the Loop for Online Data

Online data streams make training machine learning models hard because of distribution shift and new patterns emerging over time. For natural language processing (NLP) tasks that utilize a collection of features based on lexicons and rules,…

Computation and Language · Computer Science 2022-11-28 Shubhanshu Mishra , Jana Diesner

PyPLN: a Distributed Platform for Natural Language Processing

This paper presents a distributed platform for Natural Language Processing called PyPLN. PyPLN leverages a vast array of NLP and text processing open source tools, managing the distribution of the workload on a variety of configurations:…

Computation and Language · Computer Science 2013-02-20 Flávio Codeço Coelho , Renato Rocha Souza , Álvaro Justen , Flávio Amieiro , Heliana Mello

Typhoon: Thai Large Language Models

Typhoon is a series of Thai large language models (LLMs) developed specifically for the Thai language. This technical report presents challenges and insights in developing Thai LLMs, including data preparation, pretraining,…

Computation and Language · Computer Science 2023-12-22 Kunat Pipatanakul , Phatrasek Jirabovonvisut , Potsawee Manakul , Sittipong Sripaisarnmongkol , Ruangsak Patomwong , Pathomporn Chokchainant , Kasima Tharnpipitchai

textless-lib: a Library for Textless Spoken Language Processing

Textless spoken language processing research aims to extend the applicability of standard NLP toolset onto spoken language and languages with few or no textual resources. In this paper, we introduce textless-lib, a PyTorch-based library…

Computation and Language · Computer Science 2022-02-16 Eugene Kharitonov , Jade Copet , Kushal Lakhotia , Tu Anh Nguyen , Paden Tomasello , Ann Lee , Ali Elkahky , Wei-Ning Hsu , Abdelrahman Mohamed , Emmanuel Dupoux , Yossi Adi

VietNormalizer: An Open-Source, Dependency-Free Python Library for Vietnamese Text Normalization in TTS and NLP Applications

We present VietNormalizer1, an open-source, zero-dependency Python library for Vietnamese text normalization targeting Text-to-Speech (TTS) and Natural Language Processing (NLP) applications. Vietnamese text normalization is a critical yet…

Computation and Language · Computer Science 2026-03-05 Hung Vu Nguyen , Loan Do , Thanh Ngoc Nguyen , Ushik Shrestha Khwakhali , Thanh Pham , Vinh Do , Charlotte Nguyen , Hien Nguyen

Automated Python Translation

Python is one of the most commonly used programming languages in industry and education. Its English keywords and built-in functions/modules allow it to come close to pseudo-code in terms of its readability and ease of writing. However,…

Computation and Language · Computer Science 2025-04-17 Joshua Otten , Antonios Anastasopoulos , Kevin Moran

HugNLP: A Unified and Comprehensive Library for Natural Language Processing

In this paper, we introduce HugNLP, a unified and comprehensive library for natural language processing (NLP) with the prevalent backend of HuggingFace Transformers, which is designed for NLP researchers to easily utilize off-the-shelf…

Computation and Language · Computer Science 2023-03-01 Jianing Wang , Nuo Chen , Qiushi Sun , Wenkang Huang , Chengyu Wang , Ming Gao

FairLangProc: A Python package for fairness in NLP

The rise in usage of Large Language Models to near ubiquitousness in recent years has risen societal concern about their applications in decision-making contexts, such as organizational justice or healthcare. This, in turn, poses questions…

Computation and Language · Computer Science 2025-08-06 Arturo Pérez-Peralta , Sandra Benítez-Peña , Rosa E. Lillo

TweetNLP: Cutting-Edge Natural Language Processing for Social Media

In this paper we present TweetNLP, an integrated platform for Natural Language Processing (NLP) in social media. TweetNLP supports a diverse set of NLP tasks, including generic focus areas such as sentiment analysis and named entity…

Computation and Language · Computer Science 2022-10-26 Jose Camacho-Collados , Kiamehr Rezaee , Talayeh Riahi , Asahi Ushio , Daniel Loureiro , Dimosthenis Antypas , Joanne Boisson , Luis Espinosa-Anke , Fangyu Liu , Eugenio Martínez-Cámara , Gonzalo Medina , Thomas Buhrmann , Leonardo Neves , Francesco Barbieri

AllenNLP: A Deep Semantic Natural Language Processing Platform

This paper describes AllenNLP, a platform for research on deep learning methods in natural language understanding. AllenNLP is designed to support researchers who want to build novel language understanding models quickly and easily. It is…

Computation and Language · Computer Science 2018-06-01 Matt Gardner , Joel Grus , Mark Neumann , Oyvind Tafjord , Pradeep Dasigi , Nelson Liu , Matthew Peters , Michael Schmitz , Luke Zettlemoyer

BNLP: Natural language processing toolkit for Bengali language

BNLP is an open source language processing toolkit for Bengali language consisting with tokenization, word embedding, POS tagging, NER tagging facilities. BNLP provides pre-trained model with high accuracy to do model based tokenization,…

Computation and Language · Computer Science 2021-12-02 Sagor Sarker

Parsing Thai Social Data: A New Challenge for Thai NLP

Dependency parsing (DP) is a task that analyzes text for syntactic structure and relationship between words. DP is widely used to improve natural language processing (NLP) applications in many languages such as English. Previous works on DP…

Computation and Language · Computer Science 2020-05-05 Sattaya Singkul , Borirat Khampingyot , Nattasit Maharattamalai , Supawat Taerungruang , Tawunrat Chalothorn

PyTouch: A Machine Learning Library for Touch Processing

With the increased availability of rich tactile sensors, there is an equally proportional need for open-source and integrated software capable of efficiently and effectively processing raw touch measurements into high-level signals that can…

Robotics · Computer Science 2021-05-28 Mike Lambeta , Huazhe Xu , Jingwei Xu , Po-Wei Chou , Shaoxiong Wang , Trevor Darrell , Roberto Calandra

pysentimiento: A Python Toolkit for Opinion Mining and Social NLP tasks

In recent years, the extraction of opinions and information from user-generated text has attracted a lot of interest, largely due to the unprecedented volume of content in Social Media. However, social researchers face some issues in…

Computation and Language · Computer Science 2024-07-16 Juan Manuel Pérez , Mariela Rajngewerc , Juan Carlos Giudici , Damián A. Furman , Franco Luque , Laura Alonso Alemany , María Vanina Martínez

TurkicNLP: An NLP Toolkit for Turkic Languages

Natural language processing for the Turkic language family, spoken by over 200 million people across Eurasia, remains fragmented, with most languages lacking unified tooling and resources. We present TurkicNLP, an open-source Python library…

Computation and Language · Computer Science 2026-05-25 Sherzod Hakimov

Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python

Despite impressive success of machine learning algorithms in clinical natural language processing (cNLP), rule-based approaches still have a prominent role. In this paper, we introduce medspaCy, an extensible, open-source cNLP library based…

Computation and Language · Computer Science 2021-06-16 Hannah Eyre , Alec B Chapman , Kelly S Peterson , Jianlin Shi , Patrick R Alba , Makoto M Jones , Tamara L Box , Scott L DuVall , Olga V Patterson

An Interactive Tool for Natural Language Processing on Clinical Text

Natural Language Processing (NLP) systems often make use of machine learning techniques that are unfamiliar to end-users who are interested in analyzing clinical records. Although NLP has been widely used in extracting information from…

Human-Computer Interaction · Computer Science 2017-07-10 Gaurav Trivedi , Phuong Pham , Wendy Chapman , Rebecca Hwa , Janyce Wiebe , Harry Hochheiser