TY - JOUR
T1 - Query optimization over crowdsourced data
AU - Park, Hyunjung
AU - Widom, Jennifer
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: This work was supported by the NSF (IIS-0904497), the BoeingCorporation, and a KAUST research grant.
This publication acknowledges KAUST support, but has no KAUST affiliated authors.
PY - 2013/8/26
Y1 - 2013/8/26
N2 - Deco is a comprehensive system for answering declarative queries posed over stored relational data together with data obtained on-demand from the crowd. In this paper we describe Deco's cost-based query optimizer, building on Deco's data model, query language, and query execution engine presented earlier. Deco's objective in query optimization is to find the best query plan to answer a query, in terms of estimated monetary cost. Deco's query semantics and plan execution strategies require several fundamental changes to traditional query optimization. Novel techniques incorporated into Deco's query optimizer include a cost model distinguishing between "free" existing data versus paid new data, a cardinality estimation algorithm coping with changes to the database state during query execution, and a plan enumeration algorithm maximizing reuse of common subplans in a setting that makes reuse challenging. We experimentally evaluate Deco's query optimizer, focusing on the accuracy of cost estimation and the efficiency of plan enumeration.
AB - Deco is a comprehensive system for answering declarative queries posed over stored relational data together with data obtained on-demand from the crowd. In this paper we describe Deco's cost-based query optimizer, building on Deco's data model, query language, and query execution engine presented earlier. Deco's objective in query optimization is to find the best query plan to answer a query, in terms of estimated monetary cost. Deco's query semantics and plan execution strategies require several fundamental changes to traditional query optimization. Novel techniques incorporated into Deco's query optimizer include a cost model distinguishing between "free" existing data versus paid new data, a cardinality estimation algorithm coping with changes to the database state during query execution, and a plan enumeration algorithm maximizing reuse of common subplans in a setting that makes reuse challenging. We experimentally evaluate Deco's query optimizer, focusing on the accuracy of cost estimation and the efficiency of plan enumeration.
UR - http://hdl.handle.net/10754/599435
UR - http://dl.acm.org/doi/10.14778/2536206.2536207
UR - http://www.scopus.com/inward/record.url?scp=84891073325&partnerID=8YFLogxK
U2 - 10.14778/2536206.2536207
DO - 10.14778/2536206.2536207
M3 - Article
SN - 2150-8097
VL - 6
SP - 781
EP - 792
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 10
ER -