English
Related papers

Related papers: Dataset Generation Patterns for Evaluating Knowled…

200 papers

When spreadsheets are filled freely by knowledge workers, they can contain rather unstructured content. For humans and especially machines it becomes difficult to interpret such data properly. Therefore, spreadsheets are often converted to…

Databases · Computer Science 2021-03-08 Markus Schröder , Christian Jilek , Michael Schulze , Andreas Dengel

Current publicly available knowledge work data collections lack diversity, extensive annotations, and contextual information about the users and their documents. These issues hinder objective and comparable data-driven evaluations and…

Artificial Intelligence · Computer Science 2024-10-25 Desiree Heim , Christian Jilek , Adrian Ulges , Andreas Dengel

This paper proposes three different data generators, tailored to transactional datasets, based on existing itemset-based generative models. All these generators are intuitive and easy to implement and show satisfactory performance. The…

Databases · Computer Science 2020-07-15 Christian Lezcano , Marta Arias

Recently there has been increasing interest in developing and deploying deep graph learning algorithms for many tasks, such as fraud detection and recommender systems. Albeit, there is a limited number of publicly available graph-structured…

Machine Learning · Computer Science 2023-10-06 Sajad Darabi , Piotr Bigaj , Dawid Majchrowski , Artur Kasymov , Pawel Morkisz , Alex Fit-Florea

Reusing published datasets on the Web is of great interest to researchers and developers. Their data needs may be met by submitting queries to a dataset search engine to retrieve relevant datasets. In this ongoing work towards developing a…

Information Retrieval · Computer Science 2019-08-30 Jinchi Chen , Xiaxia Wang , Gong Cheng , Evgeny Kharlamov , Yuzhong Qu

Reusing existing datasets is of considerable significance to researchers and developers. Dataset search engines help a user find relevant datasets for reuse. They can present a snippet for each retrieved dataset to explain its relevance to…

Information Retrieval · Computer Science 2019-07-03 Xiaxia Wang , Jinchi Chen , Shuxin Li , Gong Cheng , Jeff Z. Pan , Evgeny Kharlamov , Yuzhong Qu

Synthetic data generation is gaining increasing popularity in different computer vision applications. Existing state-of-the-art face recognition models are trained using large-scale face datasets, which are crawled from the Internet and…

Computer Vision and Pattern Recognition · Computer Science 2024-11-01 Hatef Otroshi Shahreza , Sébastien Marcel

Process data with confidential information cannot be shared directly in public, which hinders the research in process data mining and analytics. Data encryption methods have been studied to protect the data, but they still may be decrypted,…

Machine Learning · Computer Science 2022-03-16 Keyi Li , Sen Yang , Travis M. Sullivan , Randall S. Burd , Ivan Marsic

Previous works on knowledge-to-text generation take as input a few RDF triples or key-value pairs conveying the knowledge of some entities to generate a natural language description. Existing datasets, such as WIKIBIO, WebNLG, and E2E,…

Computation and Language · Computer Science 2020-10-27 Liying Cheng , Dekun Wu , Lidong Bing , Yan Zhang , Zhanming Jie , Wei Lu , Luo Si

As more tech companies engage in rigorous economic analyses, we are confronted with a data problem: in-house papers cannot be replicated due to use of sensitive, proprietary, or private data. Readers are left to assume that the obscured…

General Economics · Economics 2020-11-10 Allison Koenecke , Hal Varian

In general, to draw robust conclusions from a dataset, all the analyzed population must be represented on said dataset. Having a dataset that does not fulfill this condition normally leads to selection bias. Additionally, graphs have been…

Machine Learning · Computer Science 2022-05-30 Axel Wassington , Sergi Abadal

The use of synthetic graph generators is a common practice among graph-oriented benchmark designers, as it allows obtaining graphs with the required scale and characteristics. However, finding a graph generator that accurately fits the…

Existing dialog datasets contain a sequence of utterances and responses without any explicit background knowledge associated with them. This has resulted in the development of models which treat conversation as a sequence-to-sequence…

Computation and Language · Computer Science 2018-09-24 Nikita Moghe , Siddhartha Arora , Suman Banerjee , Mitesh M. Khapra

Synthetic datasets are important for evaluating and testing machine learning models. When evaluating real-life recommender systems, high-dimensional categorical (and sparse) datasets are often considered. Unfortunately, there are not many…

Information Retrieval · Computer Science 2024-12-11 Miha Malenšek , Blaž Škrlj , Blaž Mramor , Jure Demšar

We propose a new method for generating realistic datasets with distribution shifts using any decoder-based generative model. Our approach systematically creates datasets with varying intensities of distribution shifts, facilitating a…

Computer Vision and Pattern Recognition · Computer Science 2024-09-13 Roy Friedman , Rhea Chowers

Obtaining real-world network datasets is often challenging because of privacy, security, and computational constraints. In the absence of such datasets, graph generative models become essential tools for creating synthetic datasets. In this…

Machine Learning · Computer Science 2025-05-13 Arya Grayeli , Vipin Swarup , Steven E. Noel

High-quality power flow datasets are essential for training machine learning models in power systems. However, security and privacy concerns restrict access to real-world data, making statistically accurate and physically consistent…

Machine Learning · Computer Science 2025-08-26 Milad Hoseinpour , Vladimir Dvorkin

We consider the problem of graph generation guided by network statistics, i.e., the generation of graphs which have given values of various numerical measures that characterize networks, such as the clustering coefficient and the number of…

Social and Information Networks · Computer Science 2023-03-02 Jérôme Kunegis , Jun Sun , Eiko Yoneki

Graph generative models become increasingly effective for data distribution approximation and data augmentation. While they have aroused public concerns about their malicious misuses or misinformation broadcasts, just as what Deepfake…

Cryptography and Security · Computer Science 2023-06-14 Yihan Ma , Zhikun Zhang , Ning Yu , Xinlei He , Michael Backes , Yun Shen , Yang Zhang

The increasing demand for high-quality datasets in machine learning has raised concerns about the ethical and responsible creation of these datasets. Dataset creators play a crucial role in developing responsible practices, yet their…

Machine Learning · Computer Science 2024-09-04 Will Orr , Kate Crawford
‹ Prev 1 2 3 10 Next ›