TY - JOUR
T1 - Efficient interactive multiclass learning from binary feedback
AU - Ngo, Hung
AU - Luciw, Matthew
AU - Vien, Ngo Anh
AU - Nagi, Jawad
AU - Förster, Alexander
AU - Schmidhuber, Jürgen
N1 - Generated from Scopus record by KAUST IRTS on 2022-09-14
PY - 2014/10/14
Y1 - 2014/10/14
N2 - We introduce a novel algorithm called upper confidence-weighted learning (UCWL) for online multiclass learning from binary feedback (e.g., feedback that indicates whether the prediction was right or wrong). UCWL combines the upper confidence bound (UCB) framework with the soft confidence-weighted (SCW) online learning scheme. In UCB, each instance is classified using both score and uncertainty. For a given instance in the sequence, the algorithm might guess its class label primarily to reduce the class uncertainty. This is a form of informed exploration, which enables the performance to improve with lower sample complexity compared to the case without exploration. Combining UCB with SCW leads to the ability to deal well with noisy and nonseparable data, and state-of-the-art performance is achieved without increasing the computational cost. A potential application setting is human-robot interaction (HRI), where the robot is learning to classify some set of inputs while the human teaches it by providing only binary feedback-or sometimes even the wrong answer entirely. Experimental results in the HRI setting and with two benchmark datasets from other settings show that UCWL outperforms other state-of-the-art algorithms in the online binary feedback setting-and surprisingly even sometimes outperforms state-of-the-art algorithms that get full feedback (e.g., the true class label), whereas UCWL gets only binary feedback on the same data sequence.
AB - We introduce a novel algorithm called upper confidence-weighted learning (UCWL) for online multiclass learning from binary feedback (e.g., feedback that indicates whether the prediction was right or wrong). UCWL combines the upper confidence bound (UCB) framework with the soft confidence-weighted (SCW) online learning scheme. In UCB, each instance is classified using both score and uncertainty. For a given instance in the sequence, the algorithm might guess its class label primarily to reduce the class uncertainty. This is a form of informed exploration, which enables the performance to improve with lower sample complexity compared to the case without exploration. Combining UCB with SCW leads to the ability to deal well with noisy and nonseparable data, and state-of-the-art performance is achieved without increasing the computational cost. A potential application setting is human-robot interaction (HRI), where the robot is learning to classify some set of inputs while the human teaches it by providing only binary feedback-or sometimes even the wrong answer entirely. Experimental results in the HRI setting and with two benchmark datasets from other settings show that UCWL outperforms other state-of-the-art algorithms in the online binary feedback setting-and surprisingly even sometimes outperforms state-of-the-art algorithms that get full feedback (e.g., the true class label), whereas UCWL gets only binary feedback on the same data sequence.
UR - https://dl.acm.org/doi/10.1145/2629631
UR - http://www.scopus.com/inward/record.url?scp=84967146007&partnerID=8YFLogxK
U2 - 10.1145/2629631
DO - 10.1145/2629631
M3 - Article
SN - 2160-6463
VL - 4
JO - ACM Transactions on Interactive Intelligent Systems
JF - ACM Transactions on Interactive Intelligent Systems
IS - 3
ER -