TY - JOUR
T1 - Experimental Study of Totally Optimal Decision Trees
AU - Aldilaijan, Abdulla
AU - Azad, Mohammad
AU - Moshkov, Mikhail
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: Research reported in this publication was supported by King Abdullah University of Science and Technology (KAUST). We are greatly indebted to the anonymous reviewers for useful comments and suggestions.
PY - 2019/3/22
Y1 - 2019/3/22
N2 - In this paper, we present results of experimental studies related to the existence of totally optimal decision trees (which are optimal relative to two or more cost functions simultaneously) for nine decision tables from the UCI Machine Learning Repository. Such trees can be useful when we consider decision trees as algorithms for problem solving or as a way for knowledge representation. For cost functions, we use depth, average depth, and number of nodes. We study not only exact but also approximate decision trees based on five uncertainty measures: entropy, Gini index, misclassification error, relative misclassification error, and number of unordered pairs of rows with different decisions. To investigate the existence of totally optimal trees, we use an extension of dynamic programming that allows us to make multi-stage optimization of decision trees relative to a sequence of cost functions. Experimental results show that totally optimal decision trees exist in many cases. The behavior of graphs that describe how the number of decision tables with totally optimal decision trees depends on their accuracy is mainly irregular. However, one can observe some trends, in particular, an upward trend when accuracy is decreasing.
AB - In this paper, we present results of experimental studies related to the existence of totally optimal decision trees (which are optimal relative to two or more cost functions simultaneously) for nine decision tables from the UCI Machine Learning Repository. Such trees can be useful when we consider decision trees as algorithms for problem solving or as a way for knowledge representation. For cost functions, we use depth, average depth, and number of nodes. We study not only exact but also approximate decision trees based on five uncertainty measures: entropy, Gini index, misclassification error, relative misclassification error, and number of unordered pairs of rows with different decisions. To investigate the existence of totally optimal trees, we use an extension of dynamic programming that allows us to make multi-stage optimization of decision trees relative to a sequence of cost functions. Experimental results show that totally optimal decision trees exist in many cases. The behavior of graphs that describe how the number of decision tables with totally optimal decision trees depends on their accuracy is mainly irregular. However, one can observe some trends, in particular, an upward trend when accuracy is decreasing.
UR - http://hdl.handle.net/10754/631801
UR - https://content.iospress.com/articles/fundamenta-informaticae/fi1784
UR - http://www.scopus.com/inward/record.url?scp=85063323370&partnerID=8YFLogxK
U2 - 10.3233/FI-2019-1784
DO - 10.3233/FI-2019-1784
M3 - Article
SN - 0169-2968
VL - 165
SP - 245
EP - 261
JO - Fundamenta Informaticae
JF - Fundamenta Informaticae
IS - 3-4
ER -