Related papers: From Data to the p-Adic or Ultrametric Model
We study two aspects of information semantics: (i) the collection of all relationships, (ii) tracking and spotting anomaly and change. The first is implemented by endowing all relevant information spaces with a Euclidean metric in a common…
We first pursue the study of how hierarchy provides a well-adapted tool for the analysis of change. Then, using a time sequence-constrained hierarchical clustering, we develop the practical aspects of a new approach to wavelet regression.…
We review the theory and practice of determining what parts of a data set are ultrametric. It is assumed that the data set, to begin with, is endowed with a metric, and we include discussion of how this can be brought about if a…
We begin with pervasive ultrametricity due to high dimensionality and/or spatial sparsity. How extent or degree of ultrametricity can be quantified leads us to the discussion of varied practical cases when ultrametricity can be partially or…
Change point detection (CPD) and anomaly detection (AD) are essential techniques in various fields to identify abrupt changes or abnormal data instances. However, existing methods are often constrained to univariate data, face scalability…
p-Adic and noncommutative analysis are applied to describe phase transitions in disordered systems. In the noncommutative replica approach we replicate the disorder instead of the system degrees of freedom. The noncommutatibe replica…
Data assimilation, consisting in the combination of a dynamical model with a set of noisy and incomplete observations in order to infer the state of a system over time, involves uncertainty in most settings. Building upon an existing…
Current analysis of astronomical data are confronted with the daunting task of modeling the awkward features of astronomical data, among which heteroscedastic (point-dependent) errors, intrinsic scatter, non-ignorable data collection…
Unsupervised learning makes manifest the underlying structure of data without curated training and specific problem definitions. However, the inference of relationships between data points is frustrated by the `curse of dimensionality' in…
Metric learning aims to learn a highly discriminative model encouraging the embeddings of similar classes to be close in the chosen metrics and pushed apart for dissimilar ones. The common recipe is to use an encoder to extract embeddings…
Extreme value analysis is an essential methodology in the study of rare and extreme events, which hold significant interest in various fields, particularly in the context of environmental sciences. Models that employ the exceedances of…
Following a review of metric, ultrametric and generalized ultrametric, we review their application in data analysis. We show how they allow us to explore both geometry and topology of information, starting with measured data. Some themes…
Distinguishing cause from effect is a scientific challenge resisting solutions from mathematics, statistics, information theory and computer science. Compression-Complexity Causality (CCC) is a recently proposed interventional measure of…
An ultrametric topology formalizes the notion of hierarchical structure. An ultrametric embedding, referred to here as ultrametricity, is implied by a natural hierarchical embedding. Such hierarchical structure can be global in the data…
Mapping complex input data into suitable lower dimensional manifolds is a common procedure in machine learning. This step is beneficial mainly for two reasons: (1) it reduces the data dimensionality and (2) it provides a new data…
An ultrametric topology formalizes the notion of hierarchical structure. An ultrametric embedding, referred to here as ultrametricity, is implied by a hierarchical embedding. Such hierarchical structure can be global in the data set, or…
Data assimilation refers to the process of obtaining an estimate of a system's state using a model for the system's time evolution and a time series of measurements that are possibly noisy and incomplete. However, for practical reasons, the…
As data plays an increasingly pivotal role in decision-making, the emergence of data markets underscores the growing importance of data valuation. Within the machine learning landscape, Data Shapley stands out as a widely embraced method…
Numeric tabular datasets are the dominant data format in scientific practice, yet large language models lack native mechanisms for representing numeric datasets in a meaningful way across heterogeneous feature spaces. Existing approaches…
Manifold-learning techniques are routinely used in mining complex spatiotemporal data to extract useful, parsimonious data representations/parametrizations; these are, in turn, useful in nonlinear model identification tasks. We focus here…