TY - JOUR
T1 - KolmogorovSmirnov Test-Based Actively-Adaptive Thompson Sampling for Non-Stationary Bandits
AU - Ghatak, Gourab
AU - Mohanty, Hardhik
AU - Rahman, Aniq Ur
N1 - KAUST Repository Item: Exported on 2021-11-01
PY - 2021
Y1 - 2021
N2 - We consider the non-stationary multi-armed bandit (MAB) framework and propose a Kolmogorov-Smirnov (KS) test based Thompson Sampling (TS) algorithm named TS-KS, that actively detects change points and resets the TS parameters once a change is detected. In particular, for the two-armed bandit case, we derive bounds on the number of samples of the reward distribution to detect the change once it occurs. Consequently, we show that the proposed algorithm is asymptotically optimal with an arbitrarily high probability. Contrary to existing works in the literature that detect a change based on the estimation of mean rewards, our algorithm is able to detect a change when the underlying reward distribution changes even though the mean reward remains the same. Finally, to test the efficacy of the proposed algorithm, we employ it in a task-offloading scenario in wireless edge-computing. Our results show that the proposed TS-KS algorithm outperforms not only the static TS algorithm but also it performs better than other bandit algorithms designed for non-stationary environments. Moreover, the performance of TS-KS is at par with the state-of-the-art forecasting algorithms such as Facebook-PROPHET and ARIMA.
AB - We consider the non-stationary multi-armed bandit (MAB) framework and propose a Kolmogorov-Smirnov (KS) test based Thompson Sampling (TS) algorithm named TS-KS, that actively detects change points and resets the TS parameters once a change is detected. In particular, for the two-armed bandit case, we derive bounds on the number of samples of the reward distribution to detect the change once it occurs. Consequently, we show that the proposed algorithm is asymptotically optimal with an arbitrarily high probability. Contrary to existing works in the literature that detect a change based on the estimation of mean rewards, our algorithm is able to detect a change when the underlying reward distribution changes even though the mean reward remains the same. Finally, to test the efficacy of the proposed algorithm, we employ it in a task-offloading scenario in wireless edge-computing. Our results show that the proposed TS-KS algorithm outperforms not only the static TS algorithm but also it performs better than other bandit algorithms designed for non-stationary environments. Moreover, the performance of TS-KS is at par with the state-of-the-art forecasting algorithms such as Facebook-PROPHET and ARIMA.
UR - http://hdl.handle.net/10754/669415
UR - https://ieeexplore.ieee.org/document/9594693/
U2 - 10.1109/TAI.2021.3121653
DO - 10.1109/TAI.2021.3121653
M3 - Article
SN - 2691-4581
SP - 1
EP - 1
JO - IEEE Transactions on Artificial Intelligence
JF - IEEE Transactions on Artificial Intelligence
ER -