Related papers: Characterizing and Subsetting Big Data Workloads

BigDataBench: a Big Data Benchmark Suite from Internet Services

As architecture, systems, and data management communities pay greater attention to innovative big data systems and architectures, the pressure of benchmarking and evaluating these systems rises. Considering the broad use of big data…

Databases · Computer Science 2016-11-17 Lei Wang , Jianfeng Zhan , Chunjie Luo , Yuqing Zhu , Qiang Yang , Yongqiang He , Wanling Gao , Zhen Jia , Yingjie Shi , Shujie Zhang , Chen Zheng , Gang Lu , Kent Zhan , Xiaona Li , Bizhu Qiu

BigDataBench: A Scalable and Unified Big Data and AI Benchmark Suite

Several fundamental changes in technology indicate domain-specific hardware and software co-design is the only path left. In this context, architecture, system, data management, and machine learning communities pay greater attention to…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-11-26 Wanling Gao , Jianfeng Zhan , Lei Wang , Chunjie Luo , Daoyi Zheng , Xu Wen , Rui Ren , Chen Zheng , Xiwen He , Hainan Ye , Haoning Tang , Zheng Cao , Shujie Zhang , Jiahui Dai

Characterization and Architectural Implications of Big Data Workloads

Big data areas are expanding in a fast way in terms of increasing workloads and runtime systems, and this situation imposes a serious challenge to workload characterization, which is the foundation of innovative system and architecture…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-06-29 Lei Wang , Jianfeng Zhan , Zhen Jia , Rui Han

Characterizing Data Analysis Workloads in Data Centers

As the amount of data explodes rapidly, more and more corporations are using data centers to make effective decisions and gain a competitive edge. Data analysis applications play a significant role in data centers, and hence it has became…

Performance · Computer Science 2013-07-31 Zhen Jia , Lei Wang , Jianfeng Zhan , Lixin Zhang , Chunjie Luo

The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big Data Systems

Now we live in an era of big data, and big data applications are becoming more and more pervasive. How to benchmark data center computer systems running big data applications (in short big data systems) is a hot topic. In this paper, we…

Performance · Computer Science 2013-07-31 Zhen Jia , Runlin Zhou , Chunge Zhu , Lei Wang , Wanling Gao , Yingjie Shi , Jianfeng Zhan , Lixin Zhang

On Big Data Benchmarking

Big data systems address the challenges of capturing, storing, managing, analyzing, and visualizing big data. Within this context, developing benchmarks to evaluate and compare big data systems has become an active topic for both research…

Performance · Computer Science 2014-02-24 Rui Han , Xiaoyi Lu

Defining Big Data Analytics Benchmarks for Next Generation Supercomputers

The design and construction of high performance computing (HPC) systems relies on exhaustive performance analysis and benchmarking. Traditionally this activity has been geared exclusively towards simulation scientists, who, unsurprisingly,…

Performance · Computer Science 2018-11-07 Drew Schmidt , Junqi Yin , Michael Matheson , Bronson Messer , Mallikarjun Shankar

PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison

The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark…

Machine Learning · Computer Science 2017-03-03 Randal S. Olson , William La Cava , Patryk Orzechowski , Ryan J. Urbanowicz , Jason H. Moore

Data Motifs: A Lens Towards Fully Understanding Big Data and AI Workloads

The complexity and diversity of big data and AI workloads make understanding them difficult and challenging. This paper proposes a new approach to modelling and characterizing big data and AI workloads. We consider each big data and AI…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-08-28 Wanling Gao , Jianfeng Zhan , Lei Wang , Chunjie Luo , Daoyi Zheng , Fei Tang , Biwei Xie , Chen Zheng , Xu Wen , Xiwen He , Hainan Ye , Rui Ren

Understanding Big Data Analytic Workloads on Modern Processors

Big data analytics applications play a significant role in data centers, and hence it has become increasingly important to understand their behaviors in order to further improve the performance of data center computer systems, in which…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-04-21 Zhen Jia , Lei Wang , Jianfeng Zhan , Lixin Zhang , Chunjie Luo , Ninghui Sun

PageRank Pipeline Benchmark: Proposal for a Holistic System Benchmark for Big-Data Platforms

The rise of big data systems has created a need for benchmarks to measure and compare the capabilities of these systems. Big data benchmarks present unique scalability challenges. The supercomputing community has wrestled with these…

Performance · Computer Science 2016-12-13 Patrick Dreher , Chansup Byun , Chris Hill , Vijay Gadepally , Bradley Kuszmaul , Jeremy Kepner

WPC: Whole-picture Workload Characterization

This article raises an important and challenging workload characterization issue: can we uncover each critical component across the stacks contributing what percentages to any specific bottleneck? The typical critical components include…

Performance · Computer Science 2023-02-28 Lei Wang , Kaiyong Yang , Chenxi Wang , Wanling Gao , Chunjie Luo , Fan Zhang , Zhongxin Ge , Li Zhang , Guoxin Kang , Jianfeng Zhan

Identifying Dwarfs Workloads in Big Data Analytics

Big data benchmarking is particularly important and provides applicable yardsticks for evaluating booming big data systems. However, wide coverage and great complexity of big data computing impose big challenges on big data benchmarking.…

Databases · Computer Science 2015-05-27 Wanling Gao , Chunjie Luo , Jianfeng Zhan , Hainan Ye , Xiwen He , Lei Wang , Yuqing Zhu , Xinhui Tian

Introducing Milabench: Benchmarking Accelerators for AI

AI workloads, particularly those driven by deep learning, are introducing novel usage patterns to high-performance computing (HPC) systems that are not comprehensively captured by standard HPC benchmarks. As one of the largest academic…

Machine Learning · Computer Science 2024-11-26 Pierre Delaunay , Xavier Bouthillier , Olivier Breuleux , Satya Ortiz-Gagné , Olexa Bilaniuk , Fabrice Normandin , Arnaud Bergeron , Bruno Carrez , Guillaume Alain , Soline Blanc , Frédéric Osterrath , Joseph Viviano , Roger Creus-Castanyer Darshan Patil , Rabiul Awal , Le Zhang

Enriching the Machine Learning Workloads in BigBench

In the era of Big Data and the growing support for Machine Learning, Deep Learning and Artificial Intelligence algorithms in the current software systems, there is an urgent need of standardized application benchmarks that stress test and…

Machine Learning · Computer Science 2024-06-18 Matthias Polag , Todor Ivanov , Timo Eichhorn

A Dwarf-based Scalable Big Data Benchmarking Methodology

Different from the traditional benchmarking methodology that creates a new benchmark or proxy for every possible workload, this paper presents a scalable big data benchmarking methodology. Among a wide variety of big data analytics…

Hardware Architecture · Computer Science 2017-11-10 Wanling Gao , Lei Wang , Jianfeng Zhan , Chunjie Luo , Daoyi Zheng , Zhen Jia , Biwei Xie , Chen Zheng , Qiang Yang , Haibin Wang

BigDataBench: a Big Data Benchmark Suite from Web Search Engines

This paper presents our joint research efforts on big data benchmarking with several industrial partners. Considering the complexity, diversity, workload churns, and rapid evolution of big data systems, we take an incremental approach in…

Information Retrieval · Computer Science 2013-07-02 Wanling Gao , Yuqing Zhu , Zhen Jia , Chunjie Luo , Lei Wang , Zhiguo Li , Jianfeng Zhan , Yong Qi , Yongqiang He , Shiming Gong , Xiaona Li , Shujie Zhang , Bizhu Qiu

Benchmark Framework with Skewed Workloads

In this work, we present a new benchmarking suite with new real-life inspired skewed workloads to test the performance of concurrent index data structures. We started this project to prepare workloads specifically for self-adjusting data…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-19 Vitaly Aksenov , Dmitry Ivanov , Ravil Galiev

Create Benchmarks for Data Lakes

Data lakes have emerged as a flexible and scalable solution for storing and analyzing large volumes of heterogeneous data, including structured, semi-structured, and unstructured formats. Despite their growing adoption in both industry and…

Databases · Computer Science 2026-01-28 Yi Lyu , Pei-Chieh Lo , Natan Lidukhover

BatchBench: Toward a Workload-Aware Benchmark for Autoscaling Policies in Big Data Batch Processing -- A Proposed Framework

Autoscaling has become a baseline expectation for cloud-native big data processing, and the design space has expanded beyond rule-based heuristics to include learned controllers and, most recently, large language model (LLM) agents. Yet…

Information Retrieval · Computer Science 2026-05-13 Venkata Krishna Prasanth Budigi , Siri Chandana Sirigiri