Related papers: Sector and Sphere: Towards Simplified Storage and …

Data Mining Using High Performance Data Clouds: Experimental Studies Using Sector and Sphere

We describe the design and implementation of a high performance cloud that we have used to archive, analyze and mine large distributed data sets. By a cloud, we mean an infrastructure that provides resources and/or services over the…

Distributed, Parallel, and Cluster Computing · Computer Science 2008-08-25 Robert L Grossman , Yunhong Gu

Compute and Storage Clouds Using Wide Area High Performance Networks

We describe a cloud based infrastructure that we have developed that is optimized for wide area, high performance networks and designed to support data mining applications. The infrastructure consists of a storage cloud called Sector and a…

Distributed, Parallel, and Cluster Computing · Computer Science 2008-08-14 Robert L. Grossman , Yunhong Gu , Michael Sabala , Wanzhi Zhang

A Survey on Geographically Distributed Big-Data Processing using MapReduce

Hadoop and Spark are widely used distributed processing frameworks for large-scale data processing in an efficient and fault-tolerant manner on private or public clouds. These big-data processing systems are extensively used by many…

Databases · Computer Science 2017-07-07 Shlomi Dolev , Patricia Florissi , Ehud Gudes , Shantanu Sharma , Ido Singer

Past, Present and Future of Hadoop: A Survey

In this paper, a technology for massive data storage and computing named Hadoop is surveyed. Hadoop consists of heterogeneous computing devices like regular PCs abstracting away the details of parallel processing and developers can just…

Networking and Internet Architecture · Computer Science 2022-03-01 Ameneh Zarei , Shahla Safari , Mahmood Ahmadi , Farhad Mardukhi

Large-scale Data Modelling in Hive and Distributed Query Processing using MapReduce and Tez

Huge amounts of data being generated continuously by digitally interconnected systems of humans, organizations and machines. Data comes in variety of formats including structured, unstructured and semi-structured, what makes it impossible…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-31 Abzetdin Adamov

Design Architecture-Based on Web Server and Application Cluster in Cloud Environment

Cloud has been a computational and storage solution for many data centric organizations. The problem today those organizations are facing from the cloud is in data searching in an efficient manner. A framework is required to distribute the…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-03-24 Gita Shah , Annappa , K. C. Shet

An Experimental Evaluation of Performance of A Hadoop Cluster on Replica Management

Hadoop is an open source implementation of the MapReduce Framework in the realm of distributed processing. A Hadoop cluster is a unique type of computational cluster designed for storing and analyzing large data sets across cluster of…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-11-10 Muralikrishnan Ramane , Sharmila Krishnamoorthy , Sasikala Gowtham

Performance Evaluation of Distributed Computing Environments with Hadoop and Spark Frameworks

Recently, due to rapid development of information and communication technologies, the data are created and consumed in the avalanche way. Distributed computing create preconditions for analyzing and processing such Big Data by distributing…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-30 Vladyslav Taran , Oleg Alienin , Sergii Stirenko , A. Rojbi , Yuri Gordienko

Parallel Spectral Clustering Algorithm Based on Hadoop

Spectral clustering and cloud computing is emerging branch of computer science or related discipline. It overcome the shortcomings of some traditional clustering algorithm and guarantee the convergence to the optimal solution, thus have to…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-06-02 Yajun Cui , Yang Zhao , Kafei Xiao , Chenglong Zhang , Lei Wang

Streaming vs. Functions: A Cost Perspective on Cloud Event Processing

In cloud event processing, data generated at the edge is processed in real-time by cloud resources. Both distributed stream processing (DSP) and Function-as-a-Service (FaaS) have been proposed to implement such event processing…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-15 Tobias Pfandzelter , Sören Henning , Trever Schirmer , Wilhelm Hasselbring , David Bermbach

Management of Data Replication for PC Cluster-based Cloud Storage System

Storage systems are essential building blocks for cloud computing infrastructures. Although high performance storage servers are the ultimate solution for cloud storage, the implementation of inexpensive storage system remains an open…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-12-30 Julia Myint , Thinn Thu Naing

Technical Report: On the Usability of Hadoop MapReduce, Apache Spark & Apache Flink for Data Science

Distributed data processing platforms for cloud computing are important tools for large-scale data analytics. Apache Hadoop MapReduce has become the de facto standard in this space, though its programming interface is relatively low-level,…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-03-30 Bilal Akil , Ying Zhou , Uwe Röhm

Development details and computational benchmarking of DEPAM

In the big data era of observational oceanography, passive acoustics datasets are becoming too high volume to be processed on local computers due to their processor and memory limitations. As a result there is a current need for our…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-10 Paul Nguyen Hong Duc , Dorian Cazau

Evaluation of Distributed Data Processing Frameworks in Hybrid Clouds

Distributed data processing frameworks (e.g., Hadoop, Spark, and Flink) are widely used to distribute data among computing nodes of a cloud. Recently, there have been increasing efforts aimed at evaluating the performance of distributed…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-01-07 Faheem Ullah , Shagun Dhingra , Xiaoyu Xia , M. Ali Babar

Architecture of processing and analysis system for big astronomical data

This work explores the use of big data technologies deployed in the cloud for processing of astronomical data. We have applied Hadoop and Spark to the task of co-adding astronomical images. We compared the overhead and execution time of…

Instrumentation and Methods for Astrophysics · Physics 2017-04-03 Ivan Kolosov , Sergey Gerasimov , Alexander Meshcheryakov

About the Suitability of Clouds in High-Performance Computing

Cloud computing has become the ubiquitous computing and storage paradigm. It is also attractive for scientists, because they do not have to care any more for their own IT infrastructure, but can outsource it to a Cloud Service Provider of…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-01-11 Harald Richter

Performance Issues of Heterogeneous Hadoop Clusters in Cloud Computing

Nowadays most of the cloud applications process large amount of data to provide the desired results. Data volumes to be processed by cloud applications are growing much faster than computing power. This growth demands new strategies for…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-07-05 B. Thirumala Rao , N. V. Sridevi , V. Krishna Reddy , L. S. S. Reddy

Sparkle: Optimizing Spark for Large Memory Machines and Analytics

Spark is an in-memory analytics platform that targets commodity server environments today. It relies on the Hadoop Distributed File System (HDFS) to persist intermediate checkpoint states and final processing results. In Spark, immutable…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-08-22 Mijung Kim , Jun Li , Haris Volos , Manish Marwah , Alexander Ulanov , Kimberly Keeton , Joseph Tucek , Lucy Cherkasova , Le Xu , Pradeep Fernando

Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments

Cloud Computing is emerging as a new computational paradigm shift. Hadoop-MapReduce has become a powerful Computation Model for processing large data on distributed commodity hardware clusters such as Clouds. In all Hadoop implementations,…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-07-04 B. Thirumala Rao , L. S. S. Reddy

Cloud Computing -- Everything As A Service

Compute infrastructure hosted by a cloud provider allows an application to scale without limit. The application developer no longer needs to worry about the up-front investment in a server farm provisioned for a worst-case load scenario.…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-16 Michael Howard