English
Related papers

Related papers: Technical Report: CSVM Ecosystem

200 papers

The CSVM (CSV with metadata data) is issued from CSV format and used for storing experimental data, models, specifications. CSVM allows the storage of tabular data with a limited but extensible amount of metadata. This increases the…

Quantitative Methods · Quantitative Biology 2012-07-25 Gérôme Beyries , Frédéric Rodriguez

CSVM (CSV with Metadata) is a simple file format for tabular data. The possible application domain is the same as typical spreadsheets files, but CSVM is well suited for long term storage and the inter-conversion of RAW data. CSVM embeds…

Computational Engineering, Finance, and Science · Computer Science 2012-08-13 Frédéric Rodriguez

CSV is a widely used format for data representing systems control, information exchange and processing, logging, etc. Nevertheless, the format is riddled with tricky corner cases and inconsistencies, which can make input data unreliable,…

Software Engineering · Computer Science 2023-03-29 Leo Freitas , Aaron John Buhagiar

Raw data sizes are growing and proliferating in scientific research, driven by the success of data-hungry computational methods, such as machine learning. The preponderance of proprietary and shoehorned data formats make computations slower…

Databases · Computer Science 2022-01-02 David S. Smith

We describe the current state and future plans for a set of tools for scientific data management (SDM) designed to support scientific transparency and reproducible research. SDM has been in active use at our MRI Center for more than two…

Quantitative Methods · Quantitative Biology 2015-02-25 B. A. Wandell , A. Rokem , L. M. Perry , G. Schaefer , R. F. Dougherty

Scientific applications produce a huge amount of data, which imposes serious management and analysis challenges. In particular, limitations in current database management systems prevent their adoption in simulation applications, in which…

Databases · Computer Science 2019-03-18 Hermano Lustosa , Fabio Porto

Getting the best performance from the ever-increasing number of hardware platforms has been a recurring challenge for data processing systems. In recent years, the advent of data science with its increasingly numerous and complex types of…

Nowadays simulations can produce petabytes of data to be stored in parallel filesystems or large-scale databases. This data is accessed over the course of decades often by thousands of analysts and scientists. However, storing these volumes…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-11 Salvatore Di Girolamo , Pirmin Schmid , Thomas Schulthess , Torsten Hoefler

The increasingly collaborative, globalized nature of scientific research combined with the need to share data and the explosion in data volumes present an urgent need for a scientific data management system (SDMS). An SDMS presents a…

Databases · Computer Science 2020-04-09 Dale Stansberry , Suhas Somnath , Jessica Breet , Gregory Shutt , Mallikarjun Shankar

When working with astronomical data, metadata is also important. A general-purpose file format for transmission, processing and archiving large datasets should facilitate, among other things, both efficient processing of bulk data and…

Instrumentation and Methods for Astrophysics · Physics 2026-03-17 Mark Taylor

Exchanging data as character-separated values (CSV) is slow, cumbersome and error-prone. Especially for time-series data, which is common in Activity Recognition, synchronizing several independently recorded sensors is challenging. Adding…

Databases · Computer Science 2019-08-05 Philipp M. Scholl , Benjamin Völker , Bernd Becker , Kristof Van Laerhoven

It is well known that data scientists spend the majority of their time on preparing data for analysis. One of the first steps in this preparation phase is to load the data from the raw storage format. Comma-separated value (CSV) files are a…

Databases · Computer Science 2019-07-29 Gerrit J. J. van den Burg , Alfredo Nazabal , Charles Sutton

Support Vector Machines (SVMs) are popular tools for data mining tasks such as classification, regression, and density estimation. However, original SVM (C-SVM) only considers local information of data points on or over the margin.…

Artificial Intelligence · Computer Science 2010-09-28 Xin Liu , Ying Ding , Forrest Sheng Bao

The support vector machine (SVM) and deep learning (e.g., convolutional neural networks (CNNs)) are the two most famous algorithms in small and big data, respectively. Nonetheless, smaller datasets may be very important, costly, and not…

Machine Learning · Computer Science 2020-02-19 Wei-Chang Yeh

Background: The need for big data analysis requires being able to process large data which are being held fine-tuned for usage by corporate. It is only very recently that the need for big data has caught attention for low budget corporate…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-08 Abhishek Narain Singh

A common task in scientific computing is the derivation of data. This workflow extracts the most important information from large input data and stores it in smaller derived data objects. The derived data objects can then be used for…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-10 Tobias Wegner , Mario Lassnig , Peer Ueberholz , Christian Zeitnitz

Storing data is easy, but finding and using data is not. It is desirable that the data is stored in a structured format, which can be preserved and retrieved in future. Creating Metadata for the data is one way of creating structured data…

Information Theory · Computer Science 2011-01-04 Ranjeet Devarakonda , Giri Palanisamy , Jim Green

Representing scientific data sets efficiently on external storage usually involves converting them to a byte string representation using specialized reader/writer routines. The resulting storage files are frequently difficult to interpret…

Computational Engineering, Finance, and Science · Computer Science 2007-05-23 Christoph Best

Information and data exchange is an important aspect of scientific progress. In computational materials science, a prerequisite for smooth data exchange is standardization, which means using agreed conventions for, e.g., units, zero base…

Delivering a reproducible environment along with complex and up-to-date software stacks on thousands of distributed and heterogeneous worker nodes is a critical task. The CernVM-File System (CVMFS) has been designed to help various…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-03-30 Alexandre F Boyer , Christophe Haen , Federico Stagni , David R C Hill
‹ Prev 1 2 3 10 Next ›