English
Related papers

Related papers: Stanza: A Python Natural Language Processing Toolk…

200 papers

We introduce Trankit, a light-weight Transformer-based Toolkit for multilingual Natural Language Processing (NLP). It provides a trainable pipeline for fundamental NLP tasks over 100 languages, and 90 pretrained pipelines for 56 languages.…

Computation and Language · Computer Science 2021-10-18 Minh Van Nguyen , Viet Dac Lai , Amir Pouran Ben Veyseh , Thien Huu Nguyen

We introduce biomedical and clinical English model packages for the Stanza Python NLP library. These packages offer accurate syntactic analysis and named entity recognition capabilities for biomedical and clinical text, by combining…

Computation and Language · Computer Science 2020-07-30 Yuhao Zhang , Yuhui Zhang , Peng Qi , Christopher D. Manning , Curtis P. Langlotz

We present CLASSLA-Stanza, a pipeline for automatic linguistic annotation of the South Slavic languages, which is based on the Stanza natural language processing pipeline. We describe the main improvements in CLASSLA-Stanza with respect to…

Computation and Language · Computer Science 2023-08-14 Luka Terčon , Nikola Ljubešić

We introduce BlaBla, an open-source Python library for extracting linguistic features with proven clinical relevance to neurological and psychiatric diseases across many languages. BlaBla is a unifying framework for accelerating and…

Computation and Language · Computer Science 2020-05-21 Abhishek Shivkumar , Jack Weston , Raphael Lenain , Emil Fristed

In this paper, we present Lupa - a framework for large-scale analysis of the programming language usage. Lupa is a command line tool that uses the power of the IntelliJ Platform under the hood, which gives it access to powerful static…

Programming Languages · Computer Science 2022-03-30 Anna Vlasova , Maria Tigina , Ilya Vlasov , Anastasiia Birillo , Yaroslav Golubev , Timofey Bryksin

The package cleanNLP provides a set of fast tools for converting a textual corpus into a set of normalized tables. The underlying natural language processing pipeline utilizes Stanford's CoreNLP library, exposing a number of annotation…

Computation and Language · Computer Science 2018-05-04 Taylor Arnold

Natural language processing for the Turkic language family, spoken by over 200 million people across Eurasia, remains fragmented, with most languages lacking unified tooling and resources. We present TurkicNLP, an open-source Python library…

Computation and Language · Computer Science 2026-05-25 Sherzod Hakimov

We present GR-NLP-TOOLKIT, an open-source natural language processing (NLP) toolkit developed specifically for modern Greek. The toolkit provides state-of-the-art performance in five core NLP tasks, namely part-of-speech tagging,…

We present the HPLT (High Performance Language Technologies) language resources, a new massive multilingual dataset including both monolingual and bilingual corpora extracted from CommonCrawl and previously unused web crawls from the…

Stan is a popular probabilistic programming language with a self-contained syntax and semantics that is close to graphical models. Unfortunately, existing embeddings of Stan in Python use multi-line strings. That approach forces users to…

Programming Languages · Computer Science 2018-12-12 Guillaume Baudart , Martin Hirzel , Kiran Kate , Louis Mandel , Avraham Shinnar

Natural Language Processing (NLP) is increasingly used as a key ingredient in critical decision-making systems such as resume parsers used in sorting a list of job candidates. NLP systems often ingest large corpora of human text, attempting…

Computation and Language · Computer Science 2020-07-14 Esma Wali , Yan Chen , Christopher Mahoney , Thomas Middleton , Marzieh Babaeianjelodar , Mariama Njie , Jeanna Neefe Matthews

Training state-of-the-art large language models requires vast amounts of clean and diverse textual data. However, building suitable multilingual datasets remains a challenge. In this work, we present HPLT v2, a collection of high-quality…

We introduce an NLP toolkit based on object-oriented knowledge base and multi-level grammar base. This toolkit focuses on semantic parsing, it also has abilities to discover new knowledge and grammar automatically, new discovered knowledge…

Computation and Language · Computer Science 2021-06-09 Yu Guo

Princeton WordNet is one of the most important resources for natural language processing, but is only available for English. While it has been translated using the expand approach to many other languages, this is an expensive manual…

Computation and Language · Computer Science 2019-03-05 Mihael Arcan , John McCrae , Paul Buitelaar

Python is one of the most commonly used programming languages in industry and education. Its English keywords and built-in functions/modules allow it to come close to pseudo-code in terms of its readability and ease of writing. However,…

Computation and Language · Computer Science 2025-04-17 Joshua Otten , Antonios Anastasopoulos , Kevin Moran

We introduce COMBO - a fully neural NLP system for accurate part-of-speech tagging, morphological analysis, lemmatisation, and (enhanced) dependency parsing. It predicts categorical morphosyntactic features whilst also exposes their vector…

Computation and Language · Computer Science 2021-09-14 Mateusz Klimaszewski , Alina Wróblewska

This paper presents a distributed platform for Natural Language Processing called PyPLN. PyPLN leverages a vast array of NLP and text processing open source tools, managing the distribution of the workload on a variety of configurations:…

Computation and Language · Computer Science 2013-02-20 Flávio Codeço Coelho , Renato Rocha Souza , Álvaro Justen , Flávio Amieiro , Heliana Mello

One central mystery of neural NLP is what neural models "know" about their subject matter. When a neural machine translation system learns to translate from one language to another, does it learn the syntax or semantics of the languages?…

Computation and Language · Computer Science 2017-08-01 Chaitanya Malaviya , Graham Neubig , Patrick Littell

The Tajik language, written in Cyrillic script, remains severely under-resourced in terms of publicly available natural language processing (NLP) toolkits, hindering both linguistic research and applied development. This paper introduces…

Computation and Language · Computer Science 2026-05-29 Mullosharaf K. Arabov

Datasets are foundational to many breakthroughs in modern artificial intelligence. Many recent achievements in the space of natural language processing (NLP) can be attributed to the finetuning of pre-trained models on a diverse set of…

‹ Prev 1 2 3 10 Next ›