Learning Tree Pattern Transformations

Daniel Neider; Leif Sabellek; Johannes Schmidt; Fabian Vehlken; Thomas Zeume

Learning Tree Pattern Transformations

Machine Learning 2025-02-19 v2 Artificial Intelligence Computational Complexity Databases

Authors: Daniel Neider , Leif Sabellek , Johannes Schmidt , Fabian Vehlken , Thomas Zeume

Abstract

Explaining why and how a tree $t$ structurally differs from another tree $t^\star$ is a question that is encountered throughout computer science, including in understanding tree-structured data such as XML or JSON data. In this article, we explore how to learn explanations for structural differences between pairs of trees from sample data: suppose we are given a set $\{(t_1, t_1^\star),\dots, (t_n, t_n^\star)\}$ of pairs of labelled, ordered trees; is there a small set of rules that explains the structural differences between all pairs $(t_i, t_i^\star)$ ? This raises two research questions: (i) what is a good notion of "rule" in this context?; and (ii) how can sets of rules explaining a data set be learned algorithmically? We explore these questions from the perspective of database theory by (1) introducing a pattern-based specification language for tree transformations; (2) exploring the computational complexity of variants of the above algorithmic problem, e.g. showing NP-hardness for very restricted variants; and (3) discussing how to solve the problem for data from CS education research using SAT solvers.

Keywords

decision tree parsing random forest

Cite

@article{arxiv.2410.07708,
  title  = {Learning Tree Pattern Transformations},
  author = {Daniel Neider and Leif Sabellek and Johannes Schmidt and Fabian Vehlken and Thomas Zeume},
  journal= {arXiv preprint arXiv:2410.07708},
  year   = {2025}
}

Comments

Full version of the ICDT 2025 paper

Learning Tree Pattern Transformations

Abstract

Keywords

Cite

Comments

Related papers