Related papers: Typesafe Modeling in Text Mining

Annotations for Rule-Based Models

The chapter reviews the syntax to store machine-readable annotations and describes the mapping between rule-based modelling entities (e.g., agents and rules) and these annotations. In particular, we review an annotation framework and the…

Molecular Networks · Quantitative Biology 2020-06-24 Matteo Cavaliere , Vincent Danos , Ricardo Honorato-Zimmer , William Waites

Scalable Text Mining with Sparse Generative Models

The information age has brought a deluge of data. Much of this is in text form, insurmountable in scope for humans and incomprehensible in structure for computers. Text mining is an expanding field of research that seeks to utilize the…

Information Retrieval · Computer Science 2016-02-09 Antti Puurula

Text Annotation Handbook: A Practical Guide for Machine Learning Projects

This handbook is a hands-on guide on how to approach text annotation tasks. It provides a gentle introduction to the topic, an overview of theoretical concepts as well as practical advice. The topics covered are mostly technical, but…

Computation and Language · Computer Science 2023-10-19 Felix Stollenwerk , Joey Öhman , Danila Petrelli , Emma Wallerö , Fredrik Olsson , Camilla Bengtsson , Andreas Horndahl , Gabriela Zarzar Gandler

Skill-Aligned Annotation for Reliable Evaluation in Text-to-Image Generation

Text-to-image (T2I) generation has advanced rapidly, making reliable evaluation critical as performance differences between models narrow. Existing evaluation practices typically apply uniform annotation mechanisms, such as Likert-scale or…

Computer Vision and Pattern Recognition · Computer Science 2026-05-14 Abdelrahman Eldesokey , Merey Ramazanova , Ahmad Sait , Ansar Khangeldin , Karen Sanchez , Tong Zhang , Bernard Ghanem

Type Annotation for Adaptive Systems

We introduce type annotations as a flexible typing mechanism for graph systems and discuss their advantages with respect to classical typing based on graph morphisms. In this approach the type system is incorporated with the graph and…

Software Engineering · Computer Science 2016-12-07 Paolo Bottoni , Andrew Fish , Francesco Parisi Presicce

Topic Modelling: Going Beyond Token Outputs

Topic modelling is a text mining technique for identifying salient themes from a number of documents. The output is commonly a set of topics consisting of isolated tokens that often co-occur in such documents. Manual effort is often…

Computation and Language · Computer Science 2024-04-26 Lowri Williams , Eirini Anthi , Laura Arman , Pete Burnap

LambdaNet: Probabilistic Type Inference using Graph Neural Networks

As gradual typing becomes increasingly popular in languages like Python and TypeScript, there is a growing need to infer type annotations automatically. While type annotations help with tasks like code completion and static error catching,…

Programming Languages · Computer Science 2020-05-06 Jiayi Wei , Maruth Goyal , Greg Durrett , Isil Dillig

Navigating the Risks of Using Large Language Models for Text Annotation in Social Science Research

Large language models (LLMs) have the potential to revolutionize computational social science, particularly in automated textual analysis. In this paper, we conduct a systematic evaluation of the promises and risks associated with using…

Computation and Language · Computer Science 2025-07-29 Hao Lin , Yongjun Zhang

Text Classification using Data Mining

Text classification is the process of classifying documents into predefined categories based on their content. It is the automated assignment of natural language texts to predefined categories. Text classification is the primary requirement…

Information Retrieval · Computer Science 2010-09-28 S. M. Kamruzzaman , Farhana Haider , Ahmed Ryadh Hasan

TASSY -- A Text Annotation Survey System

We present a free and open-source tool for creating web-based surveys that include text annotation tasks. Existing tools offer either text annotation or survey functionality but not both. Combining the two input types is particularly…

Computation and Language · Computer Science 2021-12-20 Timo Spinde , Kanishka Sinha , Norman Meuschke , Bela Gipp

Towards Agile Text Classifiers for Everyone

Text-based safety classifiers are widely used for content moderation and increasingly to tune generative language model behavior - a topic of growing concern for the safety of digital assistants and chatbots. However, different policies…

Computation and Language · Computer Science 2023-10-24 Maximilian Mozes , Jessica Hoffmann , Katrin Tomanek , Muhamed Kouate , Nithum Thain , Ann Yuan , Tolga Bolukbasi , Lucas Dixon

Multi-label and Multi-target Sampling of Machine Annotation for Computational Stance Detection

Data collection from manual labeling provides domain-specific and task-aligned supervision for data-driven approaches, and a critical mass of well-annotated resources is required to achieve reasonable performance in natural language…

Computation and Language · Computer Science 2023-11-09 Zhengyuan Liu , Hai Leong Chieu , Nancy F. Chen

A knowledge-based approach to semi-automatic annotation of multimedia documents via user adaptation

Current approaches to the annotation process focus on annotation schemas, languages for annotation, or are very application driven. In this paper it is proposed that a more flexible architecture for annotation requires a knowledge component…

Digital Libraries · Computer Science 2007-05-23 Afzal Ballim , Nastaran Fatemi , Hatem Ghorbel , Vincenzo Pallotta

Cross-Domain Evaluation of a Deep Learning-Based Type Inference System

Optional type annotations allow for enriching dynamic programming languages with static typing features like better Integrated Development Environment (IDE) support, more precise program analysis, and early detection and prevention of…

Software Engineering · Computer Science 2023-07-31 Bernd Gruner , Tim Sonnekalb , Thomas S. Heinze , Clemens-Alexander Brust

Using compression to identify acronyms in text

Text mining is about looking for patterns in natural language text, and may be defined as the process of analyzing text to extract information from it for particular purposes. In previous work, we claimed that compression is a key…

Digital Libraries · Computer Science 2007-05-23 Stuart Yeates , David Bainbridge , Ian H. Witten

Effectiveness of Annotation-Based Static Type Inference

Benefits of static type systems are well-known: they offer guarantees that no type error will occur during runtime and, inherently, inferred types serve as documentation on how functions are called. On the other hand, many type systems have…

Programming Languages · Computer Science 2020-08-31 Isabel Wingen , Philipp Körner

A modelling language for the effective design of Java annotations

This paper describes a new modelling language for the effective design of Java annotations. Since their inclusion in the 5th edition of Java, annotations have grown from a useful tool for the addition of meta-data to play a central role in…

Programming Languages · Computer Science 2019-10-02 Irene Córdoba , Juan de Lara

Specifying Genericity through Inclusiveness and Abstractness Continuous Scales

This paper introduces a novel annotation framework for the fine-grained modeling of Noun Phrases' (NPs) genericity in natural language. The framework is designed to be simple and intuitive, making it accessible to non-expert annotators and…

Computation and Language · Computer Science 2024-04-02 Claudia Collacciani , Andrea Amelio Ravelli , Marianna Marcella Bolognesi

Text Mining for Processing Interview Data in Computational Social Science

We use commercially available text analysis technology to process interview text data from a computational social science study. We find that topical clustering and terminological enrichment provide for convenient exploration and…

Computation and Language · Computer Science 2020-12-01 Jussi Karlgren , Renee Li , Eva M Meyersson Milgrom

A Comprehensive Survey of Text Classification Techniques and Their Research Applications: Observational and Experimental Insights

The exponential growth of textual data presents substantial challenges in management and analysis, notably due to high storage and processing costs. Text classification, a vital aspect of text mining, provides robust solutions by enabling…

Computation and Language · Computer Science 2025-01-22 Kamal Taha , Paul D. Yoo , Chan Yeun , Aya Taha