English
Related papers

Related papers: High Performance Data Engineering Everywhere

200 papers

In the current era of Big Data, data engineering has transformed into an essential field of study across many branches of science. Advancements in Artificial Intelligence (AI) have broadened the scope of data engineering and opened up new…

Data is found everywhere, from health and human infrastructure to the surge of sensors and the proliferation of internet-connected devices. To meet this challenge, the data engineering field has expanded significantly in recent years in…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-06 Mills Staylor , Arup Kumar Sarker , Gregor von Laszewski , Geoffrey Fox , Yue Cheng , Judy Fox

The data science community today has embraced the concept of Dataframes as the de facto standard for data representation and manipulation. Ease of use, massive operator coverage, and popularization of R and Python languages have heavily…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-06 Niranda Perera , Supun Kamburugamuve , Chathura Widanage , Vibhatha Abeykoon , Ahmet Uyar , Kaiying Shan , Hasara Maithree , Damitha Lenadora , Thejaka Amila Kanewala , Geoffrey Fox

Data pre-processing is a fundamental component in any data-driven application. With the increasing complexity of data processing operations and volume of data, Cylon, a distributed dataframe system, is developed to facilitate data…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-02 Kaiying Shan , Niranda Perera , Damitha Lenadora , Tianle Zhong , Arup Sarker , Supun Kamburugamuve , Thejaka Amila Kanewela , Chathura Widanage , Geoffrey Fox

The data engineering and data science community has embraced the idea of using Python & R dataframes for regular applications. Driven by the big data revolution and artificial intelligence, these applications are now essential in order to…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-20 Niranda Perera , Kaiying Shan , Supun Kamburugamuwe , Thejaka Amila Kanewela , Chathura Widanage , Arup Sarker , Mills Staylor , Tianle Zhong , Vibhatha Abeykoon , Geoffrey Fox

Data engineering is becoming an increasingly important part of scientific discoveries with the adoption of deep learning and machine learning. Data engineering deals with a variety of data formats, storage, data extraction, transformation,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-14 Vibhatha Abeykoon , Niranda Perera , Chathura Widanage , Supun Kamburugamuve , Thejaka Amila Kanewala , Hasara Maithree , Pulasthi Wickramasinghe , Ahmet Uyar , Geoffrey Fox

Managing and preparing complex data for deep learning, a prevalent approach in large-scale data science can be challenging. Data transfer for model training also presents difficulties, impacting scientific fields like genomics, climate…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-09 Arup Kumar Sarker , Aymen Alsaadi , Niranda Perera , Mills Staylor , Gregor von Laszewski , Matteo Turilli , Ozgur Ozan Kilic , Mikhail Titov , Andre Merzky , Shantenu Jha , Geoffrey Fox

The Data Science domain has expanded monumentally in both research and industry communities during the past decade, predominantly owing to the Big Data revolution. Artificial Intelligence (AI) and Machine Learning (ML) are bringing more…

Significant obstacles exist in scientific domains including genetics, climate modeling, and astronomy due to the management, preprocess, and training on complicated data for deep learning. Even while several large-scale solutions offer…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-23 Arup Kumar Sarker , Aymen Alsaadi , Alexander James Halpern , Prabhath Tangella , Mikhail Titov , Niranda Perera , Mills Staylor , Gregor von Laszewski , Shantenu Jha , Geoffrey Fox

Programming modern high-performance computing systems is challenging due to the need to efficiently program GPUs and accelerators and to handle data movement between nodes. The C++ language has been continuously enhanced in recent years…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-12 Biagio Cosenza , Lorenzo Carpentieri , Kaijie Fan , Marco D'Antonio , Peter Thoman , Philip Salzmann

Python has become the de facto language for scientific computing. Programming in Python is highly productive, mainly due to its rich science-oriented software ecosystem built around the NumPy module. As a result, the demand for Python…

Python has become the prime language for application development in the Data Science and Machine Learning domains. However, data scientists are not necessarily experienced programmers. While Python lets them quickly implement their…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-24 Oscar Castro , Pierrick Bruneau , Jean-Sébastien Sottet , Dario Torregrossa

The scaling of large language models (LLMs) is currently bottlenecked by the rigidity of distributed programming. While high-performance libraries like CuBLAS and NCCL provide optimized primitives, they lack the flexibility required for…

Current trends point to a future where large-scale scientific applications are tightly-coupled HPC/AI hybrids. Hence, we urgently need to invest in creating a seamless, scalable framework where HPC and AI/ML can efficiently work together…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-06 Jens Domke , Mohamed Wahib , Anshu Dubey , Tal Ben-Nun , Erik W. Draeger

This paper introduces Sparklen, a statistical learning toolkit for Hawkes processes in Python, designed to bring together efficiency and ease of use. The purpose of this package is to provide the Python community with a complete suite of…

Methodology · Statistics 2025-03-31 Romain Edmond Lacoste

With the spreading prevalence of Big Data, many advances have recently been made in this field. Frameworks such as Apache Hadoop and Apache Spark have gained a lot of traction over the past decades and have become massively popular,…

Databases · Computer Science 2017-11-28 Anand Gupta , Hardeo Thakur , Ritvik Shrivastava , Pulkit Kumar , Sreyashi Nag

With the growing complexity of computational and experimental facilities, many scientific researchers are turning to machine learning (ML) techniques to analyze large scale ensemble data. With complexities such as multi-component workflows,…

Data-intensive applications impact many domains, and their steadily increasing size and complexity demands high-performance, highly usable environments. We integrate a set of ideas developed in various data science and data engineering…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-08-02 Supun Kamburugamuve , Chathura Widanage , Niranda Perera , Vibhatha Abeykoon , Ahmet Uyar , Thejaka Amila Kanewala , Gregor von Laszewski , Geoffrey Fox

In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models, for which only…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-17 Claudia Misale , Maurizio Drocco , Marco Aldinucci , Guy Tremblay

Data-intensive applications are becoming commonplace in all science disciplines. They are comprised of a rich set of sub-domains such as data engineering, deep learning, and machine learning. These applications are built around efficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-08-16 Vibhatha Abeykoon , Supun Kamburugamuve , Chathura Widanage , Niranda Perera , Ahmet Uyar , Thejaka Amila Kanewala , Gregor von Laszewski , Geoffrey Fox
‹ Prev 1 2 3 10 Next ›