Related papers: Table Transformers for Imputing Textual Attributes
When working with tabular data, missingness is always one of the most painful problems. Throughout many years, researchers have continuously explored better and better ways to impute missing data. Recently, with the rapid development…
Missing data is a widespread problem in tabular settings. Existing solutions range from simple averaging to complex generative adversarial networks, but due to each method's large variance in performance across real-world domains and…
We present CACTI, a masked autoencoding approach for imputing tabular data that leverages the structure in missingness patterns and contextual information. Our approach employs a novel median truncated copy masking training strategy that…
Data imputation is an effective way to handle missing data, which is common in practical applications. In this study, we propose and test a novel data imputation process that achieve two important goals: (1) preserve the row-wise…
Learning with a limited number of labeled data is a central problem in real-world applications of machine learning, as it is often expensive to obtain annotations. To deal with the scarcity of labeled data, transfer learning is a…
Tables are widely used with various structures to organize and present data. Recent attempts on table understanding mainly focus on relational tables, yet overlook to other common table structures. In this paper, we propose TUTA, a unified…
In this paper, we propose a novel feature weighting method to address the limitation of existing feature processing methods for tabular data. Typically the existing methods assume equal importance across all samples and features in one…
Although data may be abundant, complete data is less so, due to missing columns or rows. This missingness undermines the performance of downstream data products that either omit incomplete cases or create derived completed data for…
Handling missing values in tabular datasets presents a significant challenge in training and testing artificial intelligence models, an issue usually addressed using imputation techniques. Here we introduce "Not Another Imputation Method"…
Tabular data plays a crucial role in various domains but often suffers from missing values, thereby curtailing its potential utility. Traditional imputation techniques frequently yield suboptimal results and impose substantial computational…
Missing values of varying patterns and rates in real-world tabular data pose a significant challenge in developing reliable data-driven models. The most commonly used statistical and machine learning methods for missing value imputation may…
Missing data is a pervasive issue in both scientific and engineering tasks, especially for the modeling of spatiotemporal data. This problem attracts many studies to contribute to data-driven solutions. Existing imputation solutions mainly…
Tabular data is a fundamental form of data structure. The evolution of table analysis tools reflects humanity's continuous progress in data acquisition, management, and processing. The dynamic changes in table columns arise from…
Tabular data sets with varying missing values are prepared for machine learning using an arbitrary imputation strategy. Synthetic values generated by imputation models often raise concerns regarding data quality and the reliability of…
Dealing with tabular data is challenging due to partial information, noise, and heterogeneous structure. Existing techniques often struggle to simultaneously address key aspects of tabular data such as textual information, a variable number…
Missing values are pervasive in real-world tabular data and can significantly impair downstream analysis. Imputing them is especially challenging in text-rich tables, where dependencies are implicit, complex, and dispersed across long…
Imputation methods play a critical role in enhancing the quality of practical time-series data, which often suffer from pervasive missing values. Recently, diffusion-based generative imputation methods have demonstrated remarkable success…
Tabular data builds the basis for a wide range of applications, yet real-world datasets are frequently incomplete due to collection errors, privacy restrictions, or sensor failures. As missing values degrade the performance or hinder the…
The capabilities of large language models (LLMs) have been successfully applied in the context of table representation learning. The recently proposed tabular language models have reported state-of-the-art results across various tasks for…
Missing data imputation can help improve the performance of prediction models in situations where missing data hide useful information. This paper compares methods for imputing missing categorical data for supervised classification tasks.…