Related papers: Automatic Metadata Generation using Associative Ne…

New Methods for Metadata Extraction from Scientific Literature

Within the past few decades we have witnessed digital revolution, which moved scholarly communication to electronic media and also resulted in a substantial increase in its volume. Nowadays keeping track with the latest scientific…

Digital Libraries · Computer Science 2017-10-30 Dominika Tkaczyk

AUGUST: an Automatic Generation Understudy for Synthesizing Conversational Recommendation Datasets

High-quality data is essential for conversational recommendation systems and serves as the cornerstone of the network architecture development and training strategy design. Existing works contribute heavy human efforts to manually labeling…

Computation and Language · Computer Science 2023-06-19 Yu Lu , Junwei Bao , Zichen Ma , Xiaoguang Han , Youzheng Wu , Shuguang Cui , Xiaodong He

A supervised generative optimization approach for tabular data

Synthetic data generation has emerged as a crucial topic for financial institutions, driven by multiple factors, such as privacy protection and data augmentation. Many algorithms have been proposed for synthetic data generation but reaching…

Machine Learning · Computer Science 2024-05-13 Shinpei Nakamura-Sakai , Fadi Hamad , Saheed Obitayo , Vamsi K. Potluru

A Generative AI-driven Metadata Modelling Approach

Since decades, the modelling of metadata has been core to the functioning of any academic library. Its importance has only enhanced with the increasing pervasiveness of Generative Artificial Intelligence (AI)-driven information activities…

Digital Libraries · Computer Science 2025-03-18 Mayukh Bagchi

GenSyn: A Multi-stage Framework for Generating Synthetic Microdata using Macro Data Sources

Individual-level data (microdata) that characterizes a population, is essential for studying many real-world problems. However, acquiring such data is not straightforward due to cost and privacy constraints, and access is often limited to…

Machine Learning · Computer Science 2022-12-13 Angeela Acharya , Siddhartha Sikdar , Sanmay Das , Huzefa Rangwala

Generating social networks with static and dynamic utility-maximization approaches

In this paper, we introduce a conceptual framework that model human social networks as an undirected dot-product graph of independent individuals. Their relationships are only determined by a cost-benefit analysis, i.e. by maximizing an…

Probability · Mathematics 2024-11-26 Aldric Labarthe , Yann Kerzreho

Synthetic data generation for system identification: leveraging knowledge transfer from similar systems

This paper addresses the challenge of overfitting in the learning of dynamical systems by introducing a novel approach for the generation of synthetic data, aimed at enhancing model generalization and robustness in scenarios characterized…

Machine Learning · Computer Science 2024-03-11 Dario Piga , Matteo Rufolo , Gabriele Maroni , Manas Mejari , Marco Forgione

Studying the Role of Synthetic Data for Machine Learning-based Wireless Networks Traffic Forecasting

Synthetic data generation is an appealing tool for augmenting and enriching datasets, playing a crucial role in advancing artificial intelligence (AI) and machine learning (ML). Not only does synthetic data help build robust AI/ML datasets…

Systems and Control · Electrical Eng. & Systems 2026-03-20 José Pulido , Francesc Wilhelmi , Sergio Fortes , Alfonso Fernández-Durán , Lorenzo Galati Giordano , Raquel Barco

Synthetic data generation for a longitudinal cohort study -- Evaluation, method extension and reproduction of published data analysis results

Access to individual-level health data is essential for gaining new insights and advancing science. In particular, modern methods based on artificial intelligence rely on the availability of and access to large datasets. In the health…

Methodology · Statistics 2023-05-16 Lisa Kühnel , Julian Schneider , Ines Perrar , Tim Adams , Fabian Prasser , Ute Nöthlings , Holger Fröhlich , Juliane Fluck

Metadata Improves Segmentation Through Multitasking Elicitation

Metainformation is a common companion to biomedical images. However, this potentially powerful additional source of signal from image acquisition has had limited use in deep learning methods, for semantic segmentation in particular. Here,…

Image and Video Processing · Electrical Eng. & Systems 2023-08-21 Iaroslav Plutenko , Mikhail Papkov , Kaupo Palo , Leopold Parts , Dmytro Fishman

Making Sense of Data in the Wild: Data Analysis Automation at Scale

As the volume of publicly available data continues to grow, researchers face the challenge of limited diversity in benchmarking machine learning tasks. Although thousands of datasets are available in public repositories, the sheer abundance…

Information Retrieval · Computer Science 2025-02-25 Mara Graziani , Malina Molnar , Irina Espejo Morales , Joris Cadow-Gossweiler , Teodoro Laino

Leveraging Retrieval Augmented Generative LLMs For Automated Metadata Description Generation to Enhance Data Catalogs

Data catalogs serve as repositories for organizing and accessing diverse collection of data assets, but their effectiveness hinges on the ease with which business users can look-up relevant content. Unfortunately, many data catalogs within…

Information Retrieval · Computer Science 2025-03-13 Mayank Singh , Abhijeet Kumar , Sasidhar Donaparthi , Gayatri Karambelkar

Reasoning-Driven Synthetic Data Generation and Evaluation

Although many AI applications of interest require specialized multi-modal models, relevant data to train such models is inherently scarce or inaccessible. Filling these gaps with human annotators is prohibitively expensive, error-prone, and…

Artificial Intelligence · Computer Science 2026-04-01 Tim R. Davidson , Benoit Seguin , Enrico Bacis , Cesar Ilharco , Hamza Harkous

Generative Expansion of Small Datasets: An Expansive Graph Approach

Limited data availability in machine learning significantly impacts performance and generalization. Traditional augmentation methods enhance moderately sufficient datasets. GANs struggle with convergence when generating diverse samples.…

Machine Learning · Computer Science 2024-10-02 Vahid Jebraeeli , Bo Jiang , Hamid Krim , Derya Cansever

Node metadata can produce predictability transitions in network inference problems

Network inference is the process of learning the properties of complex networks from data. Besides using information about known links in the network, node attributes and other forms of network metadata can help to solve network inference…

Data Analysis, Statistics and Probability · Physics 2021-03-29 Oscar Fajardo-Fontiveros , Marta Sales-Pardo , Roger Guimera

Structure and inference in annotated networks

For many networks of scientific interest we know both the connections of the network and information about the network nodes, such as the age or gender of individuals in a social network, geographic location of nodes in the Internet, or…

Social and Information Networks · Computer Science 2016-06-17 M. E. J. Newman , Aaron Clauset

Network structure, metadata and the prediction of missing nodes and annotations

The empirical validation of community detection methods is often based on available annotations on the nodes that serve as putative indicators of the large-scale network structure. Most often, the suitability of the annotations as…

Physics and Society · Physics 2016-09-30 Darko Hric , Tiago P. Peixoto , Santo Fortunato

Diffusion Models with Double Guidance: Generate with aggregated datasets

Creating large-scale datasets for training high-performance generative models is often prohibitively expensive, especially when associated attributes or annotations must be provided. As a result, merging existing datasets has become a…

Machine Learning · Statistics 2026-03-31 Yanfeng Yang , Kenji Fukumizu

The Impact of Modern AI in Metadata Management

Metadata management plays a critical role in data governance, resource discovery, and decision-making in the data-driven era. While traditional metadata approaches have primarily focused on organization, classification, and resource reuse,…

Databases · Computer Science 2025-07-17 Wenli Yang , Rui Fu , Muhammad Bilal Amin , Byeong Kang

Machine Learning for Synthetic Data Generation: A Review

Machine learning heavily relies on data, but real-world applications often encounter various data-related issues. These include data of poor quality, insufficient data points leading to under-fitting of machine learning models, and…

Machine Learning · Computer Science 2025-04-07 Yingzhou Lu , Lulu Chen , Yuanyuan Zhang , Minjie Shen , Huazheng Wang , Xiao Wang , Capucine van Rechem , Tianfan Fu , Wenqi Wei