English
Related papers

Related papers: Python Implementation of the Dynamic Distributed D…

200 papers

Julia is a new language for writing data analysis programs that are easy to implement and run at high performance. Similarly, the Dynamic Distributed Dimensional Data Model (D4M) aims to clarify data analysis operations while retaining…

Mathematical Software · Computer Science 2016-12-13 Alexander Chen , Alan Edelman , Jeremy Kepner , Vijay Gadepally , Dylan Hutchison

The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity…

The D4M tool was developed to address many of today's data needs. This tool is used by hundreds of researchers to perform complex analytics on unstructured data. Over the past few years, the D4M toolbox has evolved to support connectivity…

Databases · Computer Science 2017-11-09 Lauren Milechin , Vijay Gadepally , Siddharth Samsi , Jeremy Kepner , Alexander Chen , Dylan Hutchison

The Dynamic Distributed Dimensional Data Model (D4M) library implements associative arrays in a variety of languages (Python, Julia, and Matlab/Octave) and provides a lightweight in-memory database implementation of hypersparse arrays that…

The D4M tool is used by hundreds of researchers to perform complex analytics on unstructured data. Over the past few years, the D4M toolbox has evolved to support connectivity with a variety of database engines, graph analytics in the…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-02-13 Lauren Milechin , Alexander Chen , Vijay Gadepally , Dylan Hutchison , Siddharth Samsi , Jeremy Kepner

The dynamic mode decomposition (DMD) is a simple and powerful data-driven modeling technique that is capable of revealing coherent spatiotemporal patterns from data. The method's linear algebra-based formulation additionally allows for a…

Non-traditional, relaxed consistency, triple store databases are the backbone of many web companies (e.g., Google Big Table, Amazon Dynamo, and Facebook Cassandra). The Apache Accumulo database is a high performance open source relaxed…

The Apache Accumulo database is an open source relaxed consistency database that is widely used for government applications. Accumulo is designed to deliver high performance on unstructured data such as graphs of network data. This paper…

Analyzing large scale networks requires high performance streaming updates of graph representations of these data. Associative arrays are mathematical objects combining properties of spreadsheets, databases, matrices, and graphs, and are…

SciDB is a scalable, computational database management system that uses an array model for data storage. The array data model of SciDB makes it ideally suited for storing and managing large amounts of imaging data. SciDB is designed to…

Smarter applications are making better use of the insights gleaned from data, having an impact on every industry and research discipline. At the core of this revolution lies the tools and the methods that are driving it, from processing the…

Machine Learning · Computer Science 2020-04-01 Sebastian Raschka , Joshua Patterson , Corey Nolet

Python is rapidly becoming the lingua franca of machine learning and scientific computing. With the broad use of frameworks such as Numpy, SciPy, and TensorFlow, scientific computing and machine learning are seeing a productivity boost on…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-09-01 Zane Fink , Simeng Liu , Jaemin Choi , Matthias Diener , Laxmikant V. Kale

Detecting anomalous behavior in network traffic is a major challenge due to the volume and velocity of network traffic. For example, a 10 Gigabit Ethernet connection can generate over 50 MB/s of packet headers. For global network providers,…

Data engineering is becoming an increasingly important part of scientific discoveries with the adoption of deep learning and machine learning. Data engineering deals with a variety of data formats, storage, data extraction, transformation,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-14 Vibhatha Abeykoon , Niranda Perera , Chathura Widanage , Supun Kamburugamuve , Thejaka Amila Kanewala , Hasara Maithree , Pulasthi Wickramasinghe , Ahmet Uyar , Geoffrey Fox

Python has become the de facto language for scientific computing. Programming in Python is highly productive, mainly due to its rich science-oriented software ecosystem built around the NumPy module. As a result, the demand for Python…

Python has become the prime language for application development in the Data Science and Machine Learning domains. However, data scientists are not necessarily experienced programmers. While Python lets them quickly implement their…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-24 Oscar Castro , Pierrick Bruneau , Jean-Sébastien Sottet , Dario Torregrossa

Each step in the data analytics pipeline is important, including database ingest and query. The D4M-Accumulo database connector has allowed analysts to quickly and easily ingest to and query from Apache Accumulo using MATLAB(R)/GNU Octave…

Databases · Computer Science 2018-12-18 Lauren Milechin , Vijay Gadepally , Jeremy Kepner

In the current era of Big Data, data engineering has transformed into an essential field of study across many branches of science. Advancements in Artificial Intelligence (AI) have broadened the scope of data engineering and opened up new…

We introduce D2O, a Python module for cluster-distributed multi-dimensional numerical arrays. It acts as a layer of abstraction between the algorithm code and the data-distribution logic. The main goal is to achieve usability without losing…

Mathematical Software · Computer Science 2016-11-02 T. Steininger , M. Greiner , F. Beaujean , T. Enßlin

While deep learning excels in natural image and language processing, its application to high-dimensional data faces computational challenges due to the dimensionality curse. Current large-scale data tools focus on business-oriented…

Machine Learning · Computer Science 2025-07-01 Chen Zhang
‹ Prev 1 2 3 10 Next ›