Related papers: Policy Aware Geospatial Data
This paper discusses the problem of lack of clear licensing and transparency of usage terms and conditions for research metadata. Making research data connected, discoverable and reusable are the key enablers of the new data revolution in…
Latency to end-users and regulatory requirements push large companies to build data centers all around the world. The resulting data is "born" geographically distributed. On the other hand, many machine learning applications require a…
This paper provides a taxonomy for the licensing of data in the fields of artificial intelligence and machine learning. The paper's goal is to build towards a common framework for data licensing akin to the licensing of open source…
Scientific discovery is increasingly dependent on a scientist's ability to acquire, curate, integrate, analyze, and share large and diverse collections of data. While the details vary from domain to domain, these data often consist of…
In this paper, we present Digital Rights Management systems (DRMS) which are becoming more and more complex due to technology revolution in relation with telecommunication networks, multimedia applications and the reading equipments (Mobile…
XrML is becoming a popular language in industry for writing software licenses. The semantics for XrML is implicitly given by an algorithm that determines if a permission follows from a set of licenses. We focus on a fragment of the language…
Within research institutions like CERN (European Organization for Nuclear Research) there are often disparate databases (different in format, type and structure) that users need to access in a domain-specific manner. Users may want to…
In this paper, we summarize work-in-progress on expert system support to automate some data deposit and release decisions within a data repository, and to generate custom license agreements for those data transfers. Our approach formalizes…
Developing Machine Learning (ML) algorithms for heterogeneous/mixed data is a longstanding problem. Many ML algorithms are not applicable to mixed data, which include numeric and non-numeric data, text, graphs and so on to generate…
The widespread practice of indiscriminate data scraping to fine-tune language models (LMs) raises significant legal and ethical concerns, particularly regarding compliance with data protection laws such as the General Data Protection…
The rapid advancement of immersive technologies has propelled the development of the Metaverse, where the convergence of virtual and physical realities necessitates the generation of high-quality, photorealistic images to enhance user…
The ability to transform location-centric geospatial data into meaningful computational representations has become fundamental to modern spatial analysis and decision-making. Geospatial Representation Learning (GRL), the process of…
When faced with a new dataset, most practitioners begin by performing exploratory data analysis to discover interesting patterns and characteristics within data. Techniques such as association rule mining are commonly applied to uncover…
Recent advances in machine learning have been supported by the emergence of domain-specific software libraries, enabling streamlined workflows and increased reproducibility. For geospatial machine learning (GeoML), the availability of Earth…
Machine learning is promising, but it often needs to process vast amounts of sensitive data which raises concerns about privacy. In this white-paper, we introduce Substra, a distributed framework for privacy-preserving, traceable and…
Dataset licensing is currently an issue in the development of machine learning systems. And in the development of machine learning systems, the most widely used are publicly available datasets. However, since the images in the publicly…
Publicly available datasets are one of the key drivers for commercial AI software. The use of publicly available datasets is governed by dataset licenses. These dataset licenses outline the rights one is entitled to on a given dataset and…
Data heterogeneity in federated learning, characterized by a significant misalignment between local and global distributions, leads to divergent local optimization directions and hinders global model training. Existing studies mainly focus…
We describe a method and new no-code software tools enabling domain experts to build custom structured, labeled datasets from the unstructured text of documents and build niche machine learning text classification models traceable to…
One of the challenges currently problems in the use of cloud services is the task of designing of specialized data management systems. This is especially important for hybrid systems in which the data are located in public and private…