TY - GEN
T1 - Provenance-Based Debugging and Drill-Down in Data-Oriented Workflows
AU - Ikeda, Robert
AU - Cho, Junsang
AU - Fang, Charlie
AU - Salihoglu, Semih
AU - Torikai, Satoshi
AU - Widom, Jennifer
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: This work is supported by the National Science Foundation (IIS-0904497)and a KAUST research grant.
This publication acknowledges KAUST support, but has no KAUST affiliated authors.
PY - 2012/4
Y1 - 2012/4
N2 - Panda (for Provenance and Data) is a system that supports the creation and execution of data-oriented workflows, with automatic provenance generation and built-in provenance tracing operations. Workflows in Panda are arbitrary a cyclic graphs containing both relational (SQL) processing nodes and opaque processing nodes programmed in Python. For both types of nodes, Panda generates logical provenance - provenance information stored at the processing-node level - and uses the generated provenance to support record-level backward tracing and forward tracing operations. In our demonstration we use Panda to integrate, process, and analyze actual education data from multiple sources. We specifically demonstrate how Panda's provenance generation and tracing capabilities can be very useful for workflow debugging, and for drilling down on specific results of interest. © 2012 IEEE.
AB - Panda (for Provenance and Data) is a system that supports the creation and execution of data-oriented workflows, with automatic provenance generation and built-in provenance tracing operations. Workflows in Panda are arbitrary a cyclic graphs containing both relational (SQL) processing nodes and opaque processing nodes programmed in Python. For both types of nodes, Panda generates logical provenance - provenance information stored at the processing-node level - and uses the generated provenance to support record-level backward tracing and forward tracing operations. In our demonstration we use Panda to integrate, process, and analyze actual education data from multiple sources. We specifically demonstrate how Panda's provenance generation and tracing capabilities can be very useful for workflow debugging, and for drilling down on specific results of interest. © 2012 IEEE.
UR - http://hdl.handle.net/10754/599412
UR - http://ieeexplore.ieee.org/document/6228180/
UR - http://www.scopus.com/inward/record.url?scp=84864186416&partnerID=8YFLogxK
U2 - 10.1109/icde.2012.118
DO - 10.1109/icde.2012.118
M3 - Conference contribution
SN - 9780769547473
SP - 1249
EP - 1252
BT - 2012 IEEE 28th International Conference on Data Engineering
PB - Institute of Electrical and Electronics Engineers (IEEE)
ER -