In this paper, we design a routing scheduling framework for multi-task agent using reinforcement learning. The objective is to employ an autonomous agent to cover the maximum of pre-scheduled tasks spatially and temporally distributed in a given geographical area over a pre-determined period of time. In this approach, we train the agent using Q-learning (QL), an off-policy temporal difference learning algorithm, that finds effective near-optimal solutions. The agent uses the feedback received from previously taken decisions to learn and adapt its next actions accordingly. A customized reward function was developed to consider the time windows of task and the delays caused by agent navigation between tasks. Numerical simulations show the behavior of the autonomous agent for different selected scenarios and corroborate the ability of QL to handle complex vehicle routing problems with several constraints.
|Original language||English (US)|
|Title of host publication||Midwest Symposium on Circuits and Systems|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|Number of pages||4|
|State||Published - Aug 1 2019|