Review

The need for producing quick answers to large queries is well-recognized, even if these answers be approximate. See, for instance, [2]. There are two primary ways to accomplish this. One is to sample "on-the-fly", as suggested in [3]. The other is to recognize that typical large queries involve large aggregations, so one could estimate values for these aggregates with the help of pre-computed information such as histograms.

Whereas the first technique is more flexible in the types of queries it can support and in the extent of approximation error allowed, the second technique requires less change to the existing query processing system of a database, and is likely to be substantially faster and more robust.

The Aqua project, being reviewed here, is the leading implementation of the second technique. I was impressed by the demonstration I saw at the VLDB conference, and believe this is a project worth paying attention to.


a service of  Schloss Dagstuhl - Leibniz Center for Informatics