English
Related papers

Related papers: Column Type Annotation using ChatGPT

200 papers

Entity Matching is the task of deciding if two entity descriptions refer to the same real-world entity. State-of-the-art entity matching methods often rely on fine-tuning Transformer models such as BERT or RoBERTa. Two major drawbacks of…

Computation and Language · Computer Science 2023-06-23 Ralph Peeters , Christian Bizer

Inferring meta information about tables, such as column headers or relationships between columns, is an active research topic in data management as we find many tables are missing some of this information. In this paper, we study the…

Databases · Computer Science 2022-03-02 Yoshihiko Suhara , Jinfeng Li , Yuliang Li , Dan Zhang , Çağatay Demiralp , Chen Chen , Wang-Chiew Tan

Large pre-trained language models have exhibited unprecedented capabilities in producing high-quality text via prompting techniques. This fact introduces new possibilities for data collection and annotation, particularly in situations where…

Computation and Language · Computer Science 2023-05-25 Tiziano Labruna , Sofia Brenna , Andrea Zaninello , Bernardo Magnini

Understanding the semantics of columns in relational tables is an important pre-processing step for indexing data lakes in order to provide rich data search. An approach to establishing such understanding is column type annotation (CTA)…

Computation and Language · Computer Science 2025-03-05 Keti Korini , Christian Bizer

Column type annotation is vital for tasks like data cleaning, integration, and visualization. Recent solutions rely on resource-intensive language models fine-tuned on well-annotated columns from a particular set of tables, i.e., a source…

Computation and Language · Computer Science 2026-02-10 Yushi Sun , Xujia Li , Nan Tang , Quanqing Xu , Chuanhui Yang , Lei Chen

Objective: Clinical deep phenotyping and phenotype annotation play a critical role in both the diagnosis of patients with rare disorders as well as in building computationally-tractable knowledge in the rare disorders field. These processes…

Harnessing the potential of large language models (LLMs) like ChatGPT can help address social challenges through inclusive, ethical, and sustainable means. In this paper, we investigate the extent to which ChatGPT can annotate data for…

Artificial Intelligence · Computer Science 2024-07-10 Yiming Zhu , Peixian Zhang , Ehsan-Ul Haq , Pan Hui , Gareth Tyson

Recent studies have demonstrated promising potential of ChatGPT for various text annotation and classification tasks. However, ChatGPT is non-deterministic which means that, as with human coders, identical input can lead to different…

Computation and Language · Computer Science 2023-04-24 Michael V. Reiss

Existing deep-learning approaches to semantic column type annotation (CTA) have important shortcomings: they rely on semantic types which are fixed at training time; require a large number of training samples per type and incur large…

Computation and Language · Computer Science 2024-08-20 Benjamin Feuer , Yurong Liu , Chinmay Hegde , Juliana Freire

Automatically annotating column types with knowledge base (KB) concepts is a critical task to gain a basic understanding of web tables. Current methods rely on either table metadata like column name or entity correspondences of cells in the…

Computation and Language · Computer Science 2018-11-15 Jiaoyan Chen , Ernesto Jimenez-Ruiz , Ian Horrocks , Charles Sutton

The semantic annotation of tabular data plays a crucial role in various downstream tasks. Previous research has proposed knowledge graph (KG)-based and deep learning-based methods, each with its inherent limitations. KG-based methods…

Machine Learning · Computer Science 2024-06-04 Yubo Wang , Hao Xin , Lei Chen

Column Type Annotation (CTA) is a fundamental step towards enabling schema alignment and semantic understanding of tabular data. Existing encoder-only language models achieve high accuracy when fine-tuned on labeled columns, but their…

Databases · Computer Science 2025-12-30 Hanze Meng , Jianhao Cao , Rachel Pottinger

Work done to uncover the knowledge encoded within pre-trained language models rely on annotated corpora or human-in-the-loop methods. However, these approaches are limited in terms of scalability and the scope of interpretation. We propose…

Computation and Language · Computer Science 2023-10-23 Basel Mousi , Nadir Durrani , Fahim Dalvi

Recent research has highlighted the potential of LLM applications, like ChatGPT, for performing label annotation on social computing text. However, it is already well known that performance hinges on the quality of the input prompts. To…

Computation and Language · Computer Science 2024-02-21 Yiming Zhu , Zhizhuo Yin , Gareth Tyson , Ehsan-Ul Haq , Lik-Hang Lee , Pan Hui

Detecting semantic concept of columns in tabular data is of particular interest to many applications ranging from data integration, cleaning, search to feature engineering and model building in machine learning. Recently, several works have…

Artificial Intelligence · Computer Science 2020-12-17 Udayan Khurana , Sainyam Galhotra

Many NLP applications require manual data annotations for a variety of tasks, notably to train classifiers or evaluate the performance of unsupervised models. Depending on the size and degree of complexity, the tasks may be conducted by…

Computation and Language · Computer Science 2023-07-20 Fabrizio Gilardi , Meysam Alizadeh , Maël Kubli

This study investigates the capabilities of large language models (LLMs), specifically ChatGPT, in annotating MT outputs based on an error typology. In contrast to previous work focusing mainly on general language, we explore ChatGPT's…

Computation and Language · Computer Science 2025-04-22 Joachim Minder , Guillaume Wisniewski , Natalie Kübler

ChatGPT has achieved remarkable success in natural language understanding. Considering that recommendation is indeed a conversation between users and the system with items as words, which has similar underlying pattern with ChatGPT, we…

Information Retrieval · Computer Science 2024-04-16 Yabin Zhang , Wenhui Yu , Erhan Zhang , Xu Chen , Lantao Hu , Peng Jiang , Kun Gai

In recent years, single cell RNA sequencing has become a widely used technique to study cellular diversity and function. However, accurately annotating cell types from single cell data has been a challenging task, as it requires extensive…

Genomics · Quantitative Biology 2023-04-07 Zehua Zeng , Hongwu Du

Tables are a prevalent format for structured data, yet their metadata, such as semantic types and column relationships, is often incomplete or ambiguous. Column annotation tasks, including Column Type Annotation (CTA) and Column Property…

Databases · Computer Science 2025-08-26 Zhihao Ding , Yongkang Sun , Jieming Shi
‹ Prev 1 2 3 10 Next ›