Related papers: Column Type Annotation using ChatGPT

Using ChatGPT for Entity Matching

Entity Matching is the task of deciding if two entity descriptions refer to the same real-world entity. State-of-the-art entity matching methods often rely on fine-tuning Transformer models such as BERT or RoBERTa. Two major drawbacks of…

Computation and Language · Computer Science 2023-06-23 Ralph Peeters , Christian Bizer

Annotating Columns with Pre-trained Language Models

Inferring meta information about tables, such as column headers or relationships between columns, is an active research topic in data management as we find many tables are missing some of this information. In this paper, we study the…

Databases · Computer Science 2022-03-02 Yoshihiko Suhara , Jinfeng Li , Yuliang Li , Dan Zhang , Çağatay Demiralp , Chen Chen , Wang-Chiew Tan

Unraveling ChatGPT: A Critical Analysis of AI-Generated Goal-Oriented Dialogues and Annotations

Large pre-trained language models have exhibited unprecedented capabilities in producing high-quality text via prompting techniques. This fact introduces new possibilities for data collection and annotation, particularly in situations where…

Computation and Language · Computer Science 2023-05-25 Tiziano Labruna , Sofia Brenna , Andrea Zaninello , Bernardo Magnini

Evaluating Knowledge Generation and Self-Refinement Strategies for LLM-based Column Type Annotation

Understanding the semantics of columns in relational tables is an important pre-processing step for indexing data lakes in order to provide rich data search. An approach to establishing such understanding is column type annotation (CTA)…

Computation and Language · Computer Science 2025-03-05 Keti Korini , Christian Bizer

LakeHopper: Cross Data Lakes Column Type Annotation through Model Adaptation

Column type annotation is vital for tasks like data cleaning, integration, and visualization. Recent solutions rely on resource-intensive language models fine-tuned on well-annotated columns from a particular set of tables, i.e., a source…

Computation and Language · Computer Science 2026-02-10 Yushi Sun , Xujia Li , Nan Tang , Quanqing Xu , Chuanhui Yang , Lei Chen

An evaluation of GPT models for phenotype concept recognition

Objective: Clinical deep phenotyping and phenotype annotation play a critical role in both the diagnosis of patients with rare disorders as well as in building computationally-tractable knowledge in the rare disorders field. These processes…

Computation and Language · Computer Science 2023-11-27 Tudor Groza , Harry Caufield , Dylan Gration , Gareth Baynam , Melissa A Haendel , Peter N Robinson , Christopher J Mungall , Justin T Reese

Exploring the Capability of ChatGPT to Reproduce Human Labels for Social Computing Tasks (Extended Version)

Harnessing the potential of large language models (LLMs) like ChatGPT can help address social challenges through inclusive, ethical, and sustainable means. In this paper, we investigate the extent to which ChatGPT can annotate data for…

Artificial Intelligence · Computer Science 2024-07-10 Yiming Zhu , Peixian Zhang , Ehsan-Ul Haq , Pan Hui , Gareth Tyson

Testing the Reliability of ChatGPT for Text Annotation and Classification: A Cautionary Remark

Recent studies have demonstrated promising potential of ChatGPT for various text annotation and classification tasks. However, ChatGPT is non-deterministic which means that, as with human coders, identical input can lead to different…

Computation and Language · Computer Science 2023-04-24 Michael V. Reiss

ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models

Existing deep-learning approaches to semantic column type annotation (CTA) have important shortcomings: they rely on semantic types which are fixed at training time; require a large number of training samples per type and incur large…

Computation and Language · Computer Science 2024-08-20 Benjamin Feuer , Yurong Liu , Chinmay Hegde , Juliana Freire

ColNet: Embedding the Semantics of Web Tables for Column Type Prediction

Automatically annotating column types with knowledge base (KB) concepts is a critical task to gain a basic understanding of web tables. Current methods rely on either table metadata like column name or entity correspondences of cells in the…

Computation and Language · Computer Science 2018-11-15 Jiaoyan Chen , Ernesto Jimenez-Ruiz , Ian Horrocks , Charles Sutton

KGLink: A column type annotation method that combines knowledge graph and pre-trained language model

The semantic annotation of tabular data plays a crucial role in various downstream tasks. Previous research has proposed knowledge graph (KG)-based and deep learning-based methods, each with its inherent limitations. KG-based methods…

Machine Learning · Computer Science 2024-06-04 Yubo Wang , Hao Xin , Lei Chen

Robust LLM-based Column Type Annotation via Prompt Augmentation with LoRA Tuning

Column Type Annotation (CTA) is a fundamental step towards enabling schema alignment and semantic understanding of tabular data. Existing encoder-only language models achieve high accuracy when fine-tuned on labeled columns, but their…

Databases · Computer Science 2025-12-30 Hanze Meng , Jianhao Cao , Rachel Pottinger

Can LLMs facilitate interpretation of pre-trained language models?

Work done to uncover the knowledge encoded within pre-trained language models rely on annotated corpora or human-in-the-loop methods. However, these approaches are limited in terms of scalability and the scope of interpretation. We propose…

Computation and Language · Computer Science 2023-10-23 Basel Mousi , Nadir Durrani , Fahim Dalvi

APT-Pipe: A Prompt-Tuning Tool for Social Data Annotation using ChatGPT

Recent research has highlighted the potential of LLM applications, like ChatGPT, for performing label annotation on social computing text. However, it is already well known that performance hinges on the quality of the input prompts. To…

Computation and Language · Computer Science 2024-02-21 Yiming Zhu , Zhizhuo Yin , Gareth Tyson , Ehsan-Ul Haq , Lik-Hang Lee , Pan Hui

Semantic Annotation for Tabular Data

Detecting semantic concept of columns in tabular data is of particular interest to many applications ranging from data integration, cleaning, search to feature engineering and model building in machine learning. Recently, several works have…

Artificial Intelligence · Computer Science 2020-12-17 Udayan Khurana , Sainyam Galhotra

ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks

Many NLP applications require manual data annotations for a variety of tasks, notably to train classifiers or evaluate the performance of unsupervised models. Depending on the size and degree of complexity, the tasks may be conducted by…

Computation and Language · Computer Science 2023-07-20 Fabrizio Gilardi , Meysam Alizadeh , Maël Kubli

Testing LLMs' Capabilities in Annotating Translations Based on an Error Typology Designed for LSP Translation: First Experiments with ChatGPT

This study investigates the capabilities of large language models (LLMs), specifically ChatGPT, in annotating MT outputs based on an error typology. In contrast to previous work focusing mainly on general language, we explore ChatGPT's…

Computation and Language · Computer Science 2025-04-22 Joachim Minder , Guillaume Wisniewski , Natalie Kübler

RecGPT: Generative Personalized Prompts for Sequential Recommendation via ChatGPT Training Paradigm

ChatGPT has achieved remarkable success in natural language understanding. Considering that recommendation is indeed a conversation between users and the system with items as words, which has similar underlying pattern with ChatGPT, we…

Information Retrieval · Computer Science 2024-04-16 Yabin Zhang , Wenhui Yu , Erhan Zhang , Xu Chen , Lantao Hu , Peng Jiang , Kun Gai

Revolutionizing Single Cell Analysis: The Power of Large Language Models for Cell Type Annotation

In recent years, single cell RNA sequencing has become a widely used technique to study cellular diversity and function. However, accurately annotating cell types from single cell data has been a challenging task, as it requires extensive…

Genomics · Quantitative Biology 2023-04-07 Zehua Zeng , Hongwu Du

Retrieve-and-Verify: A Table Context Selection Framework for Accurate Column Annotations

Tables are a prevalent format for structured data, yet their metadata, such as semantic types and column relationships, is often incomplete or ambiguous. Column annotation tasks, including Column Type Annotation (CTA) and Column Property…

Databases · Computer Science 2025-08-26 Zhihao Ding , Yongkang Sun , Jieming Shi