Related papers: Annotating Columns with Pre-trained Language Model…

ColNet: Embedding the Semantics of Web Tables for Column Type Prediction

Automatically annotating column types with knowledge base (KB) concepts is a critical task to gain a basic understanding of web tables. Current methods rely on either table metadata like column name or entity correspondences of cells in the…

Computation and Language · Computer Science 2018-11-15 Jiaoyan Chen , Ernesto Jimenez-Ruiz , Ian Horrocks , Charles Sutton

Column Type Annotation using ChatGPT

Column type annotation is the task of annotating the columns of a relational table with the semantic type of the values contained in each column. Column type annotation is an important pre-processing step for data search and data…

Computation and Language · Computer Science 2023-08-01 Keti Korini , Christian Bizer

Semantic Annotation for Tabular Data

Detecting semantic concept of columns in tabular data is of particular interest to many applications ranging from data integration, cleaning, search to feature engineering and model building in machine learning. Recently, several works have…

Artificial Intelligence · Computer Science 2020-12-17 Udayan Khurana , Sainyam Galhotra

Evaluating Knowledge Generation and Self-Refinement Strategies for LLM-based Column Type Annotation

Understanding the semantics of columns in relational tables is an important pre-processing step for indexing data lakes in order to provide rich data search. An approach to establishing such understanding is column type annotation (CTA)…

Computation and Language · Computer Science 2025-03-05 Keti Korini , Christian Bizer

Corpus Considerations for Annotator Modeling and Scaling

Recent trends in natural language processing research and annotation tasks affirm a paradigm shift from the traditional reliance on a single ground truth to a focus on individual perspectives, particularly in subjective tasks. In scenarios…

Computation and Language · Computer Science 2024-04-18 Olufunke O. Sarumi , Béla Neuendorf , Joan Plepi , Lucie Flek , Jörg Schlötterer , Charles Welch

Learning from Imperfect Annotations

Many machine learning systems today are trained on large amounts of human-annotated data. Data annotation tasks that require a high level of competency make data acquisition expensive, while the resulting labels are often subjective,…

Machine Learning · Computer Science 2020-04-08 Emmanouil Antonios Platanios , Maruan Al-Shedivat , Eric Xing , Tom Mitchell

Learning Semantic Annotations for Tabular Data

The usefulness of tabular data such as web tables critically depends on understanding their semantics. This study focuses on column type prediction for tables without any meta data. Unlike traditional lexical matching-based methods, we…

Databases · Computer Science 2019-06-04 Jiaoyan Chen , Ernesto Jimenez-Ruiz , Ian Horrocks , Charles Sutton

Assisted Text Annotation Using Active Learning to Achieve High Quality with Little Effort

Large amounts of annotated data have become more important than ever, especially since the rise of deep learning techniques. However, manual annotations are costly. We propose a tool that enables researchers to create large, high-quality,…

Digital Libraries · Computer Science 2021-12-23 Franziska Weeber , Felix Hamborg , Karsten Donnay , Bela Gipp

Sato: Contextual Semantic Type Detection in Tables

Detecting the semantic types of data columns in relational tables is important for various data preparation and information retrieval tasks such as data cleaning, schema matching, data discovery, and semantic search. However, existing…

Databases · Computer Science 2020-06-04 Dan Zhang , Yoshihiko Suhara , Jinfeng Li , Madelon Hulsebos , Çağatay Demiralp , Wang-Chiew Tan

AdaTyper: Adaptive Semantic Column Type Detection

Understanding the semantics of relational tables is instrumental for automation in data exploration and preparation systems. A key source for understanding a table is the semantics of its columns. With the rise of deep learning, learned…

Databases · Computer Science 2023-11-27 Madelon Hulsebos , Paul Groth , Çağatay Demiralp

Auto-Annotation Quality Prediction for Semi-Supervised Learning with Ensembles

Auto-annotation by ensemble of models is an efficient method of learning on unlabeled data. Wrong or inaccurate annotations generated by the ensemble may lead to performance degradation of the trained model. To deal with this problem we…

Computer Vision and Pattern Recognition · Computer Science 2024-03-14 Dror Simon , Miriam Farber , Roman Goldenberg

Towards Model-Based Data Acquisition for Subjective Multi-Task NLP Problems

Data annotated by humans is a source of knowledge by describing the peculiarities of the problem and therefore fueling the decision process of the trained model. Unfortunately, the annotation process for subjective natural language…

Computation and Language · Computer Science 2023-12-14 Kamil Kanclerz , Julita Bielaniewicz , Marcin Gruza , Jan Kocon , Stanisław Woźniak , Przemysław Kazienko

Robust LLM-based Column Type Annotation via Prompt Augmentation with LoRA Tuning

Column Type Annotation (CTA) is a fundamental step towards enabling schema alignment and semantic understanding of tabular data. Existing encoder-only language models achieve high accuracy when fine-tuned on labeled columns, but their…

Databases · Computer Science 2025-12-30 Hanze Meng , Jianhao Cao , Rachel Pottinger

DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation

Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights to comprehensively answer a given user query for tabular data. In this work, we aim to propose new resources and benchmarks to inspire future…

Computation and Language · Computer Science 2024-10-30 Xueqing Wu , Rui Zheng , Jingzhen Sha , Te-Lin Wu , Hanyu Zhou , Mohan Tang , Kai-Wei Chang , Nanyun Peng , Haoran Huang

Retrieve-and-Verify: A Table Context Selection Framework for Accurate Column Annotations

Tables are a prevalent format for structured data, yet their metadata, such as semantic types and column relationships, is often incomplete or ambiguous. Column annotation tasks, including Column Type Annotation (CTA) and Column Property…

Databases · Computer Science 2025-08-26 Zhihao Ding , Yongkang Sun , Jieming Shi

Multi-annotator Deep Learning: A Probabilistic Framework for Classification

Solving complex classification tasks using deep neural networks typically requires large amounts of annotated data. However, corresponding class labels are noisy when provided by error-prone annotators, e.g., crowdworkers. Training standard…

Machine Learning · Computer Science 2023-10-25 Marek Herde , Denis Huseljic , Bernhard Sick

Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future

Annotated data is an essential ingredient in natural language processing for training and evaluating machine learning models. It is therefore very desirable for the annotations to be of high quality. Recent work, however, has shown that…

Computation and Language · Computer Science 2022-09-27 Jan-Christoph Klie , Bonnie Webber , Iryna Gurevych

Automatic Alignment of Discourse Relations of Different Discourse Annotation Frameworks

Existing discourse corpora are annotated based on different frameworks, which show significant dissimilarities in definitions of arguments and relations and structural constraints. Despite surface differences, these frameworks share basic…

Computation and Language · Computer Science 2024-04-09 Yingxue Fu

Evaluating Large Language Models as Expert Annotators

Textual data annotation, the process of labeling or tagging text with relevant information, is typically costly, time-consuming, and labor-intensive. While large language models (LLMs) have demonstrated their potential as direct…

Computation and Language · Computer Science 2025-08-12 Yu-Min Tseng , Wei-Lin Chen , Chung-Chi Chen , Hsin-Hsi Chen

Learning from others' mistakes: Finetuning machine translation models with span-level error annotations

Despite growing interest in incorporating feedback to improve language models, most efforts focus only on sequence-level annotations. In this work, we explore the potential of utilizing fine-grained span-level annotations from offline…

Computation and Language · Computer Science 2024-10-23 Lily H. Zhang , Hamid Dadkhahi , Mara Finkelstein , Firas Trabelsi , Jiaming Luo , Markus Freitag