Causal Discovery using Compression-Complexity Measures

Pranay SY; Nithin Nagaraj

doi:10.1016/j.jbi.2021.103724

Causal Discovery using Compression-Complexity Measures

Machine Learning 2021-03-18 v3 Data Analysis, Statistics and Probability Machine Learning

Authors: Pranay SY , Nithin Nagaraj

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

Causal inference is one of the most fundamental problems across all domains of science. We address the problem of inferring a causal direction from two observed discrete symbolic sequences $X$ and $Y$ . We present a framework which relies on lossless compressors for inferring context-free grammars (CFGs) from sequence pairs and quantifies the extent to which the grammar inferred from one sequence compresses the other sequence. We infer $X$ causes $Y$ if the grammar inferred from $X$ better compresses $Y$ than in the other direction. To put this notion to practice, we propose three models that use the Compression-Complexity Measures (CCMs) - Lempel-Ziv (LZ) complexity and Effort-To-Compress (ETC) to infer CFGs and discover causal directions without demanding temporal structures. We evaluate these models on synthetic and real-world benchmarks and empirically observe performances competitive with current state-of-the-art methods. Lastly, we present two unique applications of the proposed models for causal inference directly from pairs of genome sequences belonging to the SARS-CoV-2 virus. Using a large number of sequences, we show that our models capture directed causal information exchange between sequence pairs, presenting novel opportunities for addressing key issues such as contact-tracing, motif discovery, evolution of virulence and pathogenicity in future applications.

Keywords

causal inference

Cite

@article{arxiv.2010.09336,
  title  = {Causal Discovery using Compression-Complexity Measures},
  author = {Pranay SY and Nithin Nagaraj},
  journal= {arXiv preprint arXiv:2010.09336},
  year   = {2021}
}

Comments

Accepted version with major revisions to results and discussion. 17 pages, 9 figures

Causal Discovery using Compression-Complexity Measures

Abstract

Keywords

Cite

Comments

Related papers