Related papers: Data Mining Using High Performance Data Clouds: Ex…
We describe a cloud based infrastructure that we have developed that is optimized for wide area, high performance networks and designed to support data mining applications. The infrastructure consists of a storage cloud called Sector and a…
Cloud computing has demonstrated that processing very large datasets over commodity clusters can be done simply given the right programming model and infrastructure. In this paper, we describe the design and implementation of the Sector…
Recently, due to rapid development of information and communication technologies, the data are created and consumed in the avalanche way. Distributed computing create preconditions for analyzing and processing such Big Data by distributing…
Cloud has been a computational and storage solution for many data centric organizations. The problem today those organizations are facing from the cloud is in data searching in an efficient manner. A framework is required to distribute the…
Cloud computing has become the ubiquitous computing and storage paradigm. It is also attractive for scientists, because they do not have to care any more for their own IT infrastructure, but can outsource it to a Cloud Service Provider of…
Huge amounts of data being generated continuously by digitally interconnected systems of humans, organizations and machines. Data comes in variety of formats including structured, unstructured and semi-structured, what makes it impossible…
Cloud computing recently developed into a viable alternative to on-premises systems for executing high-performance computing (HPC) applications. With the emergence of new vendors and hardware options, there is now a growing need to…
Can cloud computing infrastructures provide HPC-competitive performance for scientific applications broadly? Despite prolific related literature, this question remains open. Answers are crucial for designing future systems and democratizing…
The number of mobile devices (e.g., smartphones, wearable technologies) is rapidly growing. In line with this trend, a massive amount of spatial data is being collected since these devices allow users to geo-tag user-generated content.…
Nowadays most of the cloud applications process large amount of data to provide the desired results. Data volumes to be processed by cloud applications are growing much faster than computing power. This growth demands new strategies for…
This work explores the use of big data technologies deployed in the cloud for processing of astronomical data. We have applied Hadoop and Spark to the task of co-adding astronomical images. We compared the overhead and execution time of…
In this paper, a technology for massive data storage and computing named Hadoop is surveyed. Hadoop consists of heterogeneous computing devices like regular PCs abstracting away the details of parallel processing and developers can just…
Hadoop and Spark are widely used distributed processing frameworks for large-scale data processing in an efficient and fault-tolerant manner on private or public clouds. These big-data processing systems are extensively used by many…
This survey article reviews the challenges associated with deploying and optimizing big data applications and machine learning algorithms in cloud data centers and networks. The MapReduce programming model and its widely-used open-source…
The distributed computing is done on many systems to solve a large scale problem. The growing of high-speed broadband networks in developed and developing countries, the continual increase in computing power, and the rapid growth of the…
Compute infrastructure hosted by a cloud provider allows an application to scale without limit. The application developer no longer needs to worry about the up-front investment in a server farm provisioned for a worst-case load scenario.…
Various performance characteristics of distributed file systems have been well studied. However, the performance efficiency of distributed file systems on small-file problems with complex machine learning algorithms scenarios is not well…
The continuous increase in the availability of data of any kind, coupled with the development of networks of high-speed communications, the popularization of cloud computing and the growth of data centers and the emergence of…
In recent years with the rise of Cloud Computing, many companies providing services in the cloud, are empowering a new series of services to their catalogue, such as data mining and data processing, taking advantage of the vast computing…
Spectral clustering and cloud computing is emerging branch of computer science or related discipline. It overcome the shortcomings of some traditional clustering algorithm and guarantee the convergence to the optimal solution, thus have to…