English

Explaining Wrong Queries Using Small Examples

Databases 2019-04-10 v1

Abstract

For testing the correctness of SQL queries, e.g., evaluating student submissions in a database course, a standard practice is to execute the query in question on some test database instance and compare its result with that of the correct query. Given two queries Q1Q_1 and Q2Q_2, we say that a database instance DD is a counterexample (for Q1Q_1 and Q2Q_2) if Q1(D)Q_1(D) differs from Q2(D)Q_2(D); such a counterexample can serve as an explanation of why Q1Q_1 and Q2Q_2 are not equivalent. While the test database instance may serve as a counterexample, it may be too large or complex to read and understand where the inequivalence comes from. Therefore, in this paper, given a known counterexample DD for Q1Q_1 and Q2Q_2, we aim to find the smallest counterexample DDD' \subseteq D where Q1(D)Q2(D)Q_1(D') \neq Q_2(D'). The problem in general is NP-hard. We give a suite of algorithms for finding the smallest counterexample for different classes of queries, some more tractable than others. We also present an efficient provenance-based algorithm for SPJUD queries that uses a constraint solver, and extend it to more complex queries with aggregation, group-by, and nested queries. We perform extensive experiments indicating the effectiveness and scalability of our solution on student queries from an undergraduate database course and on queries from the TPC-H benchmark. We also report a user study from the course where we deployed our tool to help students with an assignment on relational algebra.

Keywords

Cite

@article{arxiv.1904.04467,
  title  = {Explaining Wrong Queries Using Small Examples},
  author = {Zhengjie Miao and Sudeepa Roy and Jun Yang},
  journal= {arXiv preprint arXiv:1904.04467},
  year   = {2019}
}
R2 v1 2026-06-23T08:33:46.999Z