Related papers: Streaming supercomputing needs workflow-enabled pr…
Progress in science is deeply bound to the effective use of high-performance computing infrastructures and to the efficient extraction of knowledge from vast amounts of data. Such data comes from different sources that follow a cycle…
This paper describes HyperStream, a large-scale, flexible and robust software package, written in the Python language, for processing streaming data with workflow creation capabilities. HyperStream overcomes the limitations of other…
The area of online machine learning in big data streams covers algorithms that are (1) distributed and (2) work from data streams with only a limited possibility to store past data. The first requirement mostly concerns software…
Big data processing applications are becoming more and more complex. They are no more monolithic in nature but instead they are composed of decoupled analytical processes in the form of a workflow. One type of such workflow applications is…
Interactive urgent computing is a small but growing user of supercomputing resources. However there are numerous technical challenges that must be overcome to make supercomputers fully suited to the wide range of urgent workloads which…
The excessive amounts of data generated by devices and Internet-based sources at a regular basis constitute, big data. This data can be processed and analyzed to develop useful applications for specific domains. Several mathematical and…
An increasing number of scientific applications rely on stream processing for generating timely insights from data feeds of scientific instruments, simulations, and Internet-of-Thing (IoT) sensors. The development of streaming applications…
Massive data sets have radically changed our understanding of how to design efficient algorithms; the streaming paradigm, whether it in terms of number of passes of an external memory algorithm, or the single pass and limited memory of a…
Emerging Big Data analytics and machine learning applications require a significant amount of computational power. While there exists a plethora of large-scale data processing frameworks which thrive in handling the various complexities of…
[Background] Nowadays, there is a massive growth of data volume and speed in many types of systems. It introduces new needs for infrastructure and applications that have to handle streams of data with low latency and high throughput.…
Workflow is a common term used to describe a systematic breakdown of tasks that need to be performed to solve a problem. This concept has found best use in scientific and business applications for streamlining and improving the performance…
Multicore parallel programming has some very difficult problems such as deadlocks during synchronizations and race conditions brought by concurrency. Added to the difficulty is the lack of a simple, well-accepted computing model for…
Powerful detectors at modern experimental facilities routinely collect data at multiple GB/s. Online analysis methods are needed to enable the collection of only interesting subsets of such massive data streams, such as by explicitly…
Apart from forming the backbone of compiler optimization, static dataflow analysis has been widely applied in a vast variety of applications, such as bug detection, privacy analysis, program comprehension, etc. Despite its importance,…
The amount of data in the world is expanding rapidly. Every day, huge amounts of data are created by scientific experiments, companies, and end users' activities. These large data sets have been labeled as "Big Data", and their storage,…
The role of scalable high-performance workflows and flexible workflow management systems that can support multiple simulations will continue to increase in importance. For example, with the end of Dennard scaling, there is a need to…
A growing number of applications that generate massive streams of data need intelligent data processing and online analysis. Real-time surveillance systems, telecommunication systems, sensor networks and other dynamic environments are such…
Graph processing has become an important part of various areas of computing, including machine learning, medical applications, social network analysis, computational sciences, and others. A growing amount of the associated graph processing…
Current systems for data-parallel, incremental processing and view maintenance over high-rate streams isolate the execution of independent queries. This creates unwanted redundancy and overhead in the presence of concurrent incrementally…
With the explosive growth of big data, workloads tend to get more complex and computationally demanding. Such applications are processed on distributed interconnected resources that are becoming larger in scale and computational capacity.…