Related papers: Efficient Iterative Processing in the SciDB Parall…
Parallel dataflow systems are a central part of most analytic pipelines for big data. The iterative nature of many analysis and machine learning algorithms, however, is still a challenge for current systems. While certain types of bulk…
Scientists are increasingly turning to datacenter-scale computers to produce and analyze massive arrays. Despite decades of database research that extols the virtues of declarative query processing, scientists still write, debug and…
High performance computing has been used in various fields of astrophysical research. But most of it is implemented on massively parallel systems (supercomputers) or graphical processing unit clusters. With the advent of multicore…
In today's data driven world, storing, processing, and gleaning insights from large-scale data are major challenges. Data compression is often required in order to store large amounts of high-dimensional data, and thus, efficient inference…
Iterative algorithms are widely used in digital signal processing applications. With the case study of radio astronomy calibration processing, this work contributes towards revealing and exploiting the intrinsic error resilience of…
Array-intensive programs are often amenable to parallelization across many cores on a single machine as well as scaling across multiple machines and hence are well explored, especially in the domain of high-performance computing. These…
SciDB is a scalable, computational database management system that uses an array model for data storage. The array data model of SciDB makes it ideally suited for storing and managing large amounts of imaging data. SciDB is designed to…
As applications continue to generate multi-dimensional data at exponentially increasing rates, fast analytics to extract meaningful results is becoming extremely important. The database community has developed array databases that alleviate…
While deep learning excels in natural image and language processing, its application to high-dimensional data faces computational challenges due to the dimensionality curse. Current large-scale data tools focus on business-oriented…
Efficiently solving sparse linear algebraic equations is an important research topic of numerical simulation. Commonly used approaches include direct methods and iterative methods. Compared with the direct methods, the iterative methods…
In recent times, the production of multidimensional data in various domains and their storage in array databases has witnessed a sharp increase; this rapid growth in data volumes necessitates compression in array databases. However,…
Data management applications are growing and require more attention, especially in the "big data" era. Thus, supporting such applications with novel and efficient algorithms that achieve higher performance is critical. Array database…
Big array analytics is becoming indispensable in answering important scientific and business questions. Most analysis tasks consist of multiple steps, each making one or multiple passes over the arrays to be analyzed and generating…
Many scientific applications are I/O intensive and generate or access large data sets, spanning hundreds or thousands of "files." Management, storage, efficient access, and analysis of this data present an extremely challenging task. We…
Modern large-scale deep learning workloads highlight the need for parallel execution across many devices in order to fit model data into hardware accelerator memories. In these settings, array redistribution may be required during a…
This paper describes a method for scheduling the events of a switched system to achieve an optimal performance. The approach has guarantees on convergence and computational complexity that parallel derivative-based iterative optimization…
Deep learning has excelled on complex pattern recognition tasks such as image classification and object recognition. However, it struggles with tasks requiring nontrivial reasoning, such as algorithmic computation. Humans are able to solve…
Workloads that comb through vast amounts of data are gaining importance in the sciences. These workloads consist of "needle in a haystack" queries that are long running and data intensive so that query throughput limits performance. To…
This paper outlines certain scenarios from the fields of astrophysics and fluid dynamics simulations which require high performance data warehouses that support array data type. A common feature of all these use cases is that subsetting and…
In this paper we study a new approach in optimization that aims to search a large domain D where a given function takes large, small or specific values via an iterative optimization algorithm based on the gradient. We show that the…