Related papers: On Big Data Benchmarking
The great prosperity of big data systems such as Hadoop in recent years makes the benchmarking of these systems become crucial for both research and industry communities. The complexity, diversity, and rapid evolution of big data systems…
Now we live in an era of big data, and big data applications are becoming more and more pervasive. How to benchmark data center computer systems running big data applications (in short big data systems) is a hot topic. In this paper, we…
The development of scalable, representative, and widely adopted benchmarks for graph data systems have been a question for which answers has been sought for decades. We conduct an in-depth study of the existing literature on benchmarks for…
The continuous increase of data generated provides enormous possibilities of both public and private companies. The management of this mass of data or big data will play a crucial role in the society of the future, as it finds applications…
One of the most significant problems of Big Data is to extract knowledge through the huge amount of data. The usefulness of the extracted information depends strongly on data quality. In addition to the importance, data quality has recently…
Data generation is a key issue in big data benchmarking that aims to generate application-specific data sets to meet the 4V requirements of big data. Specifically, big data generators need to generate scalable data (Volume) of different…
Recently, increasingly large amounts of data are generated from a variety of sources. Existing data processing technologies are not suitable to cope with the huge amounts of generated data. Yet, many research works focus on Big Data, a…
Big data benchmark suites must include a diversity of data and workloads to be useful in fairly evaluating big data systems and architectures. However, using truly comprehensive benchmarks poses great challenges for the architecture…
Big Data is considered proprietary asset of companies, organizations, and even nations. Turning big data into real treasure requires the support of big data systems. A variety of commercial and open source products have been unleashed for…
The design and construction of high performance computing (HPC) systems relies on exhaustive performance analysis and benchmarking. Traditionally this activity has been geared exclusively towards simulation scientists, who, unsurprisingly,…
As architecture, systems, and data management communities pay greater attention to innovative big data systems and architectures, the pressure of benchmarking and evaluating these systems rises. Considering the broad use of big data…
The rise of big data systems has created a need for benchmarks to measure and compare the capabilities of these systems. Big data benchmarks present unique scalability challenges. The supercomputing community has wrestled with these…
The amount of data in the world is expanding rapidly. Every day, huge amounts of data are created by scientific experiments, companies, and end users' activities. These large data sets have been labeled as "Big Data", and their storage,…
The aim of this article is to present an overview of the major families of state-of-the-art data processing benchmarks, namely transaction processing benchmarks and decision support benchmarks. We also address the newer trends in cloud…
Data lakes have emerged as a flexible and scalable solution for storing and analyzing large volumes of heterogeneous data, including structured, semi-structured, and unstructured formats. Despite their growing adoption in both industry and…
Benchmarking, which involves collecting reference datasets and demonstrating method performances, is a requirement for the development of new computational tools, but also becomes a domain of its own to achieve neutral comparisons of…
Big Data is reforming many industrial domains by providing decision support through analyzing large data volumes. Big Data testing aims to ensure that Big Data systems run smoothly and error-free while maintaining the performance and…
Big data present new opportunities for modern society while posing challenges for data scientists. Recent advancements in sensor networks and the widespread adoption of IoT have led to the collection of physical-sensor data on an enormous…
With the advent of big data applications and the increasing amount of data being produced in these applications, the importance of efficient methods for big data analysis has become highly evident. However, the success of any such method…
The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark…