Related papers: Semantic Annotation for Tabular Data

Learning Semantic Annotations for Tabular Data

The usefulness of tabular data such as web tables critically depends on understanding their semantics. This study focuses on column type prediction for tables without any meta data. Unlike traditional lexical matching-based methods, we…

Databases · Computer Science 2019-06-04 Jiaoyan Chen , Ernesto Jimenez-Ruiz , Ian Horrocks , Charles Sutton

ColNet: Embedding the Semantics of Web Tables for Column Type Prediction

Automatically annotating column types with knowledge base (KB) concepts is a critical task to gain a basic understanding of web tables. Current methods rely on either table metadata like column name or entity correspondences of cells in the…

Computation and Language · Computer Science 2018-11-15 Jiaoyan Chen , Ernesto Jimenez-Ruiz , Ian Horrocks , Charles Sutton

Conceptual Modeling Applied to Data Semantics

In software system design, one of the purposes of diagrammatic modeling is to explain something (e.g., data tables) to others. Very often, syntax of diagrams is specified while the intended meaning of diagrammatic constructs remains…

Software Engineering · Computer Science 2022-10-05 Sabah Al-Fedaghi

Semantic Labeling Using a Deep Contextualized Language Model

Generating schema labels automatically for column values of data tables has many data science applications such as schema matching, and data discovery and linking. For example, automatically extracted tables with missing headers can be…

Machine Learning · Computer Science 2020-11-02 Mohamed Trabelsi , Jin Cao , Jeff Heflin

Sato: Contextual Semantic Type Detection in Tables

Detecting the semantic types of data columns in relational tables is important for various data preparation and information retrieval tasks such as data cleaning, schema matching, data discovery, and semantic search. However, existing…

Databases · Computer Science 2020-06-04 Dan Zhang , Yoshihiko Suhara , Jinfeng Li , Madelon Hulsebos , Çağatay Demiralp , Wang-Chiew Tan

Semantic Classification of Tabular Datasets via Character-Level Convolutional Neural Networks

A character-level convolutional neural network (CNN) motivated by applications in "automated machine learning" (AutoML) is proposed to semantically classify columns in tabular data. Simulated data containing a set of base classes is first…

Computation and Language · Computer Science 2019-01-25 Paul Azunre , Craig Corcoran , Numa Dhamani , Jeffrey Gleason , Garrett Honke , David Sullivan , Rebecca Ruppel , Sandeep Verma , Jonathon Morgan

KGLink: A column type annotation method that combines knowledge graph and pre-trained language model

The semantic annotation of tabular data plays a crucial role in various downstream tasks. Previous research has proposed knowledge graph (KG)-based and deep learning-based methods, each with its inherent limitations. KG-based methods…

Machine Learning · Computer Science 2024-06-04 Yubo Wang , Hao Xin , Lei Chen

AdaTyper: Adaptive Semantic Column Type Detection

Understanding the semantics of relational tables is instrumental for automation in data exploration and preparation systems. A key source for understanding a table is the semantics of its columns. With the rise of deep learning, learned…

Databases · Computer Science 2023-11-27 Madelon Hulsebos , Paul Groth , Çağatay Demiralp

A Concept Annotation System for Clinical Records

Unstructured information comprises a valuable source of data in clinical records. For text mining in clinical records, concept extraction is the first step in finding assertions and relationships. This study presents a system developed for…

Information Retrieval · Computer Science 2010-12-09 Ning Kang , Rogier Barendse , Zubair Afzal , Bharat Singh , Martijn J. Schuemie , Erik M. van Mulligen , Jan A. Kors

StraTyper: Automated Semantic Type Discovery and Multi-Type Annotation for Dataset Collections

Understanding dataset semantics is crucial for effective search, discovery, and integration pipelines. To this end, column type annotation (CTA) methods associate columns of tabular datasets with semantic types that accurately describe…

Databases · Computer Science 2026-02-05 Christos Koutras , Juliana Freire

Tabular Incremental Inference

Tabular data is a fundamental form of data structure. The evolution of table analysis tools reflects humanity's continuous progress in data acquisition, management, and processing. The dynamic changes in table columns arise from…

Artificial Intelligence · Computer Science 2026-01-28 Xinda Chen , Zhen Xing , Hanyu Zhang , Weimin Tan , Bo Yan

Evaluating Knowledge Generation and Self-Refinement Strategies for LLM-based Column Type Annotation

Understanding the semantics of columns in relational tables is an important pre-processing step for indexing data lakes in order to provide rich data search. An approach to establishing such understanding is column type annotation (CTA)…

Computation and Language · Computer Science 2025-03-05 Keti Korini , Christian Bizer

Annotating Columns with Pre-trained Language Models

Inferring meta information about tables, such as column headers or relationships between columns, is an active research topic in data management as we find many tables are missing some of this information. In this paper, we study the…

Databases · Computer Science 2022-03-02 Yoshihiko Suhara , Jinfeng Li , Yuliang Li , Dan Zhang , Çağatay Demiralp , Chen Chen , Wang-Chiew Tan

Make Still Further Progress: Chain of Thoughts for Tabular Data Leaderboard

Tabular data, a fundamental data format in machine learning, is predominantly utilized in competitions and real-world applications. The performance of tabular models--such as gradient boosted decision trees and neural networks--can vary…

Machine Learning · Computer Science 2025-05-20 Si-Yang Liu , Qile Zhou , Han-Jia Ye

Survey on Semantic Interpretation of Tabular Data: Challenges and Directions

Tabular data plays a pivotal role in various fields, making it a popular format for data manipulation and exchange, particularly on the web. The interpretation, extraction, and processing of tabular information are invaluable for…

Artificial Intelligence · Computer Science 2024-11-20 Marco Cremaschi , Blerina Spahiu , Matteo Palmonari , Ernesto Jimenez-Ruiz

Table2Vec: Neural Word and Entity Embeddings for Table Population and Retrieval

Tables contain valuable knowledge in a structured form. We employ neural language modeling approaches to embed tabular data into vector spaces. Specifically, we consider different table elements, such caption, column headings, and cells,…

Information Retrieval · Computer Science 2019-06-04 Li Deng , Shuo Zhang , Krisztian Balog

A Concept and Argumentation based Interpretable Model in High Risk Domains

Interpretability has become an essential topic for artificial intelligence in some high-risk domains such as healthcare, bank and security. For commonly-used tabular data, traditional methods trained end-to-end machine learning models with…

Artificial Intelligence · Computer Science 2022-08-18 Haixiao Chi , Dawei Wang , Gaojie Cui , Feng Mao , Beishui Liao

Retrieve-and-Verify: A Table Context Selection Framework for Accurate Column Annotations

Tables are a prevalent format for structured data, yet their metadata, such as semantic types and column relationships, is often incomplete or ambiguous. Column annotation tasks, including Column Type Annotation (CTA) and Column Property…

Databases · Computer Science 2025-08-26 Zhihao Ding , Yongkang Sun , Jieming Shi

TabNet: Attentive Interpretable Tabular Learning

We propose a novel high-performance and interpretable canonical deep tabular data learning architecture, TabNet. TabNet uses sequential attention to choose which features to reason from at each decision step, enabling interpretability and…

Machine Learning · Computer Science 2020-12-10 Sercan O. Arik , Tomas Pfister

Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries

Tabular data analysis is crucial in various fields, and large language models show promise in this area. However, current research mostly focuses on rudimentary tasks like Text2SQL and TableQA, neglecting advanced analysis like forecasting…

Computation and Language · Computer Science 2023-12-22 Xinyi He , Mengyu Zhou , Xinrun Xu , Xiaojun Ma , Rui Ding , Lun Du , Yan Gao , Ran Jia , Xu Chen , Shi Han , Zejian Yuan , Dongmei Zhang