TY - GEN
T1 - Computing security strategies in finite horizon repeated Bayesian games
AU - Li, Lichun
AU - Langbort, Cedric
AU - Shamma, Jeff
N1 - Publisher Copyright:
© 2017 American Automatic Control Council (AACC).
PY - 2017/6/29
Y1 - 2017/6/29
N2 - This paper studies security strategies in two-player zero-sum repeated Bayesian games with finite horizon. In such games, each player has a private type which is independently chosen according to a publicly known a priori probability. Players' types are fixed all through the game. The game is played for finite stages. At every stage, players simultaneously choose their actions which are observed by the public. The one-stage payoff of player 1 (or penalty to player 2) depends on both players types and actions, and is not directly observed by any player. While player 1 aims to maximize the total payoff over the game, player 2 wants to minimize it. This paper provides each player two ways to compute the security strategy, i.e. The optimal strategy in the worst case. First, a security strategy that directly depends on both players' history actions is derived by refining the sequence form. Noticing that history action space grows exponentially with respect to the time horizon, this paper further presents a security strategy that depends on player's fixed sized sufficient statistics. The sufficient statistics is shown to consist of the belief on one's own type, the regret on the other player's type, and the stage, and is independent of the other player's strategy.
AB - This paper studies security strategies in two-player zero-sum repeated Bayesian games with finite horizon. In such games, each player has a private type which is independently chosen according to a publicly known a priori probability. Players' types are fixed all through the game. The game is played for finite stages. At every stage, players simultaneously choose their actions which are observed by the public. The one-stage payoff of player 1 (or penalty to player 2) depends on both players types and actions, and is not directly observed by any player. While player 1 aims to maximize the total payoff over the game, player 2 wants to minimize it. This paper provides each player two ways to compute the security strategy, i.e. The optimal strategy in the worst case. First, a security strategy that directly depends on both players' history actions is derived by refining the sequence form. Noticing that history action space grows exponentially with respect to the time horizon, this paper further presents a security strategy that depends on player's fixed sized sufficient statistics. The sufficient statistics is shown to consist of the belief on one's own type, the regret on the other player's type, and the stage, and is independent of the other player's strategy.
UR - http://www.scopus.com/inward/record.url?scp=85027066908&partnerID=8YFLogxK
U2 - 10.23919/ACC.2017.7963514
DO - 10.23919/ACC.2017.7963514
M3 - Conference contribution
AN - SCOPUS:85027066908
T3 - Proceedings of the American Control Conference
SP - 3664
EP - 3669
BT - 2017 American Control Conference, ACC 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2017 American Control Conference, ACC 2017
Y2 - 24 May 2017 through 26 May 2017
ER -