Related papers: Creating a Relational Distributed Object Store
In this paper we look at the growth of distributed object stores (DOS) and examine the underlying mechanisms that guide their use and development. Our focus is on the fundamental principles of operation that define this class of system, how…
The Fedora architecture is an extensible framework for the storage, management, and dissemination of complex objects and the relationships among them. Fedora accommodates the aggregation of local and distributed content into digital objects…
To deal with the constant growth of unstructured data, vendors have deployed scalable, resilient, and cost effective object-based storage systems built on RESTful web services. However, many applications rely on richer file-system APIs and…
Most modern data stores tend to be distributed, to enable the scaling of the data across multiple instances of commodity hardware. Although this ensures a near unlimited potential for storage, the data itself is not always ideally…
This work examines strategies to handle large shared data objects in distributed storage systems (DSS), while boosting the number of concurrent accesses, maintaining strong consistency guarantees, and ensuring good operation performance. To…
The Resource Description Framework (RDF) is continuing to grow outside the bounds of its initial function as a metadata framework and into the domain of general-purpose data modeling. This expansion has been facilitated by the continued…
Relational and noSQL storages are developed for the fast processing of the large data sets having a stable structure, while the ontologies are used to rep-resent complex and dynamic sets of information of a limited size. In the in-dustrial…
The OverRelational Manifesto (below ORM) proposes a possible approach to creation of data storage systems of the next generation. ORM starts from the requirement that information in a relational database is represented by a set of relation…
High-performance object stores are an emerging technology which offers an alternative solution in the field of HPC storage, with potential to address long-standing scalability issues in traditional distributed POSIX file systems due to…
Distributed Asynchronous Object Store (DAOS) is a novel software-defined object store leveraging Non-Volatile Memory (NVM) devices, designed for high performance. It provides a number of interfaces for applications to undertake I/O, ranging…
One of the challenges currently problems in the use of cloud services is the task of designing of specialized data management systems. This is especially important for hybrid systems in which the data are located in public and private…
We address the problem of compactly storing a large number of versions (snapshots) of a collection of keyed documents or records in a distributed environment, while efficiently answering a variety of retrieval queries over those, including…
The vision of the Semantic Web is becoming a reality with billions of RDF triples being distributed over multiple queryable end-points (e.g. Linked Data). Although there has been a body of work on RDF triples persistent storage, it seems…
The universally applied Codd's relational model has two constructs: a stored relation, with stored attributes only and a view, only with the inherited ones. In 1992, we have proposed third construct, mixing both types of attributes.…
FAIR Digital Object (FDO) is an emerging concept that is highlighted by European Open Science Cloud (EOSC) as a potential candidate for building a ecosystem of machine-actionable research outputs. In this work we systematically evaluate FDO…
Developing large-scale distributed applications can be a daunting task. object-based environments have attempted to alleviate problems by providing distributed objects that look like local objects. We advocate that this approach has…
Today's storage systems expose abstractions which are either too low-level (e.g., key-value store, raw-block store) that they require developers to re-invent the wheels, or too high-level (e.g., relational databases, Git) that they lack…
Data availability is one of the most important features in distributed storage systems, made possible by data replication. Nowadays data are generated rapidly and the goal to develop efficient, scalable and reliable storage systems has…
Companies are using machine learning to solve real-world problems and are developing hundreds to thousands of features in the process. They are building feature engineering pipelines as part of MLOps life cycle to transform data from…
Cloud-based distributed databases are a popular choice for many current applications, especially those that run over the Internet. By incorporating distributed database systems within cloud environments, it has enabled businesses to scale…