TY - JOUR
T1 - Web-ADARE: A Web-Aided Data Repairing System
AU - Gu, Binbin
AU - Li, Zhixu
AU - Yang, Qiang
AU - Xie, Qing
AU - Liu, An
AU - Liu, Guanfeng
AU - Zheng, Kai
AU - Zhang, Xiangliang
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: This research is partially supported by Natural Science Foundation of China (Grant No. 61303019, 61402313, 61472263, 61572336), Postdoctoral scientific research funding of Jiangsu Province (No. 1501090B) National 58 batch of postdoctoral funding (No. 2015M581859), Collaborative Innovation Center of Novel Software Technology and Industrialization, Jiangsu, China, and the King Abdullah University of Science and Technology.
PY - 2017/3/8
Y1 - 2017/3/8
N2 - Data repairing aims at discovering and correcting erroneous data in databases. In this paper, we develop Web-ADARE, an end-to-end web-aided data repairing system, to provide a feasible way to involve the vast data sources on the Web in data repairing. Our main attention in developing Web-ADARE is paid on the interaction problem between web-aided repairing and rule-based repairing, in order to minimize the Web consultation cost while reaching predefined quality requirements. The same interaction problem also exists in crowd-based methods but this is not yet formally defined and addressed. We first prove in theory that the optimal interaction scheme is not feasible to be achieved, and then propose an algorithm to identify a scheme for efficient interaction by investigating the inconsistencies and the dependencies between values in the repairing process. Extensive experiments on three data collections demonstrate the high repairing precision and recall of Web-ADARE, and the efficiency of the generated interaction scheme over several baseline ones.
AB - Data repairing aims at discovering and correcting erroneous data in databases. In this paper, we develop Web-ADARE, an end-to-end web-aided data repairing system, to provide a feasible way to involve the vast data sources on the Web in data repairing. Our main attention in developing Web-ADARE is paid on the interaction problem between web-aided repairing and rule-based repairing, in order to minimize the Web consultation cost while reaching predefined quality requirements. The same interaction problem also exists in crowd-based methods but this is not yet formally defined and addressed. We first prove in theory that the optimal interaction scheme is not feasible to be achieved, and then propose an algorithm to identify a scheme for efficient interaction by investigating the inconsistencies and the dependencies between values in the repairing process. Extensive experiments on three data collections demonstrate the high repairing precision and recall of Web-ADARE, and the efficiency of the generated interaction scheme over several baseline ones.
UR - http://hdl.handle.net/10754/623010
UR - http://www.sciencedirect.com/science/article/pii/S0925231217304642
UR - http://www.scopus.com/inward/record.url?scp=85015304727&partnerID=8YFLogxK
U2 - 10.1016/j.neucom.2016.09.132
DO - 10.1016/j.neucom.2016.09.132
M3 - Article
SN - 0925-2312
VL - 253
SP - 201
EP - 214
JO - Neurocomputing
JF - Neurocomputing
ER -