English

Validating Streaming JSON Documents with Learned VPAs

Formal Languages and Automata Theory 2023-05-15 v2

Abstract

We present a new streaming algorithm to validate JSON documents against a set of constraints given as a JSON schema. Among the possible values a JSON document can hold, objects are unordered collections of key-value pairs while arrays are ordered collections of values. We prove that there always exists a visibly pushdown automaton (VPA) that accepts the same set of JSON documents as a JSON schema. Leveraging this result, our approach relies on learning a VPA for the provided schema. As the learned VPA assumes a fixed order on the key-value pairs of the objects, we abstract its transitions in a special kind of graph, and propose an efficient streaming algorithm using the VPA and its graph to decide whether a JSON document is valid for the schema. We evaluate the implementation of our algorithm on a number of random JSON documents, and compare it to the classical validation algorithm.

Cite

@article{arxiv.2211.08891,
  title  = {Validating Streaming JSON Documents with Learned VPAs},
  author = {Véronique Bruyère and Guillermo A. Perez and Gaëtan Staquet},
  journal= {arXiv preprint arXiv:2211.08891},
  year   = {2023}
}

Comments

46 pages, 10 figures, published at TACAS 2023