English

PyThaiNLP: Thai Natural Language Processing in Python

Computation and Language 2025-04-25 v1

Abstract

We present PyThaiNLP, a free and open-source natural language processing (NLP) library for Thai language implemented in Python. It provides a wide range of software, models, and datasets for Thai language. We first provide a brief historical context of tools for Thai language prior to the development of PyThaiNLP. We then outline the functionalities it provided as well as datasets and pre-trained language models. We later summarize its development milestones and discuss our experience during its development. We conclude by demonstrating how industrial and research communities utilize PyThaiNLP in their work. The library is freely available at https://github.com/pythainlp/pythainlp.

Keywords

Cite

@article{arxiv.2312.04649,
  title  = {PyThaiNLP: Thai Natural Language Processing in Python},
  author = {Wannaphong Phatthiyaphaibun and Korakot Chaovavanich and Charin Polpanumas and Arthit Suriyawongkul and Lalita Lowphansirikul and Pattarawat Chormai and Peerat Limkonchotiwat and Thanathip Suntorntip and Can Udomcharoenchaikit},
  journal= {arXiv preprint arXiv:2312.04649},
  year   = {2025}
}

Comments

12 pages, 2 figures, LaTeX; typos corrected, timeline clarified for section 2. In Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023), pages 25-36, Singapore, Singapore. Empirical Methods in Natural Language Processing