DOI: 10.1145/3709691 ISSN: 2836-6573

FastPDB: Towards Bag-Probabilistic Queries at Interactive Speeds

Aaron Huber, Oliver Kennedy, Atri Rudra, Zhuoyue Zhao, Su Feng, Boris Glavic

Probabilistic databases (PDBs) provide users with a principled way to query data that is incomplete or imprecise. In this work, we study computing expected multiplicities of query results over probabilistic databases under bag semantics which has PTIME data complexity. However, does this imply that bag probabilistic databases are practical? We strive to answer this question from both a theoretical as well as a systems perspective. We employ concepts from fine-grained complexity to demonstrate that exact bag probabilistic query processing is fundamentally less efficient than deterministic bag query evaluation, but that fast approximations are possible by sampling monomials from a circuit representation of a result tuple's lineage. A remaining issue, however, is that constructing such circuits, while in PTIME, can nonetheless have significant overhead. To avoid this cost, we utilize approximate query processing techniques to directly sample monomials without materializing lineage upfront. Our implementation in

FastPDB
provides accurate anytime approximation of probabilistic query answers and scales to datasets orders of magnitude larger than competing methods.

More from our Archive