English
Related papers

Related papers: Data Engineering for HPC with Python

200 papers

The amazing advances being made in the fields of machine and deep learning are a highlight of the Big Data era for both enterprise and research communities. Modern applications require resources beyond a single node's ability to provide.…

Data-intensive applications are becoming commonplace in all science disciplines. They are comprised of a rich set of sub-domains such as data engineering, deep learning, and machine learning. These applications are built around efficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-08-16 Vibhatha Abeykoon , Supun Kamburugamuve , Chathura Widanage , Niranda Perera , Ahmet Uyar , Thejaka Amila Kanewala , Gregor von Laszewski , Geoffrey Fox

In the current era of Big Data, data engineering has transformed into an essential field of study across many branches of science. Advancements in Artificial Intelligence (AI) have broadened the scope of data engineering and opened up new…

Data-driven modeling is an approach in energy systems modeling that has been gaining popularity. In data-driven modeling, machine learning methods such as linear regression, neural networks or decision-tree based methods are being applied.…

Machine Learning · Computer Science 2023-01-05 Sandra Wilfling

While deep learning excels in natural image and language processing, its application to high-dimensional data faces computational challenges due to the dimensionality curse. Current large-scale data tools focus on business-oriented…

Machine Learning · Computer Science 2025-07-01 Chen Zhang

The data science community today has embraced the concept of Dataframes as the de facto standard for data representation and manipulation. Ease of use, massive operator coverage, and popularization of R and Python languages have heavily…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-06 Niranda Perera , Supun Kamburugamuve , Chathura Widanage , Vibhatha Abeykoon , Ahmet Uyar , Kaiying Shan , Hasara Maithree , Damitha Lenadora , Thejaka Amila Kanewala , Geoffrey Fox

Machine learning has proved to be a useful tool for extracting knowledge from scientific data in numerous research fields, including astrophysics, genomics, and molecular dynamics. Often, data sets from these research areas need to be…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-21 Javier Álvarez Cid-Fuentes , Pol Álvarez , Salvi Solà , Kuninori Ishii , Rafael K. Morizawa , Rosa M. Badia

PaPy, which stands for parallel pipelines in Python, is a highly flexible framework that enables the construction of robust, scalable workflows for either generating or processing voluminous datasets. A workflow is created from user-written…

Programming Languages · Computer Science 2014-07-17 Marcin Cieslik , Cameron Mura

Data-intensive applications impact many domains, and their steadily increasing size and complexity demands high-performance, highly usable environments. We integrate a set of ideas developed in various data science and data engineering…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-08-02 Supun Kamburugamuve , Chathura Widanage , Niranda Perera , Vibhatha Abeykoon , Ahmet Uyar , Thejaka Amila Kanewala , Gregor von Laszewski , Geoffrey Fox

This article describes a geometric partitioning software that can be used for quick computation of data partitions on many-core HPC machines. It is most suited for dynamic applications with load distributions that vary with time.…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-19 Aparna Sasidharan

Data pre-processing is a fundamental component in any data-driven application. With the increasing complexity of data processing operations and volume of data, Cylon, a distributed dataframe system, is developed to facilitate data…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-02 Kaiying Shan , Niranda Perera , Damitha Lenadora , Tianle Zhong , Arup Sarker , Supun Kamburugamuve , Thejaka Amila Kanewela , Chathura Widanage , Geoffrey Fox

Python has become the prime language for application development in the Data Science and Machine Learning domains. However, data scientists are not necessarily experienced programmers. While Python lets them quickly implement their…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-24 Oscar Castro , Pierrick Bruneau , Jean-Sébastien Sottet , Dario Torregrossa

The ability to express a program as a hierarchical composition of parts is an essential tool in managing the complexity of software and a key abstraction this provides is to separate the representation of data from the computation. Many…

Programming Languages · Computer Science 2012-10-04 James Hanlon , Simon J. Hollis , David May

This paper presents a new technique for data slicing of distributed programs running on a hierarchy of machines. Data slicing can be realized as a program transformation that partitions heaps of machines in a hierarchy into independent…

Programming Languages · Computer Science 2014-02-25 Mohamed A. El-Zawawy

Python has become a standard scientific computing language with fast-growing support of machine learning and data analysis modules, as well as an increasing usage of big data. The Dynamic Distributed Dimensional Data Model (D4M) offers a…

We propose a framework for training neural networks that are coupled with partial differential equations (PDEs) in a parallel computing environment. Unlike most distributed computing frameworks for deep neural networks, our focus is to…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-25 Kailai Xu , Weiqiang Zhu , Eric Darve

We introduce D2O, a Python module for cluster-distributed multi-dimensional numerical arrays. It acts as a layer of abstraction between the algorithm code and the data-distribution logic. The main goal is to achieve usability without losing…

Mathematical Software · Computer Science 2016-11-02 T. Steininger , M. Greiner , F. Beaujean , T. Enßlin

Smarter applications are making better use of the insights gleaned from data, having an impact on every industry and research discipline. At the core of this revolution lies the tools and the methods that are driving it, from processing the…

Machine Learning · Computer Science 2020-04-01 Sebastian Raschka , Joshua Patterson , Corey Nolet

Programmable data planes allow users to define their own data plane algorithms for network devices including appropriate data plane application programming interfaces (APIs) which may be leveraged by user-defined software-defined networking…

Networking and Internet Architecture · Computer Science 2023-01-18 Frederik Hauser , Marco Häberle , Daniel Merling , Steffen Lindner , Vladimir Gurevich , Florian Zeiger , Reinhard Frank , Michael Menth

This chapter introduces the state-of-the-art in the emerging area of combining High Performance Computing (HPC) with Big Data Analysis. To understand the new area, the chapter first surveys the existing approaches to integrating HPC with…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-01-01 Yuankun Fu , Fengguang Song
‹ Prev 1 2 3 10 Next ›