TY - GEN
T1 - Automatic failure recovery for software-defined networks
AU - Kuzniart, Maciej
AU - Perešínit, Peter
AU - Vasict, Nedeljko
AU - Canini, Marco
AU - Kostic, Dejan
PY - 2013
Y1 - 2013
N2 - Tolerating and recovering from link and switch failures are fundamental requirements of most networks, including Software-Defined Networks (SDNs). However, instead of traditional behaviors such as network-wide routing reconvergence, failure recovery in an SDN is determined by the specific software logic running at the controller. While this admits more freedom to respond to a failure event, it ultimately means that each controller application must include its own recovery logic, which makes the code more difficult to write and potentially more error-prone. In this paper, we propose a runtime system that automates failure recovery and enables network developers to write simpler, failure-agnostic code. To this end, upon detecting a failure, our approach first spawns a new controller instance that runs in an emulated environment consisting of the network topology excluding the failed elements. Then, it quickly replays inputs observed by the controller before the failure occurred, leading the emulated network into the forwarding state that accounts for the failed elements. Finally, it recovers the network by installing the difference ruleset between emulated and current forwarding states.
AB - Tolerating and recovering from link and switch failures are fundamental requirements of most networks, including Software-Defined Networks (SDNs). However, instead of traditional behaviors such as network-wide routing reconvergence, failure recovery in an SDN is determined by the specific software logic running at the controller. While this admits more freedom to respond to a failure event, it ultimately means that each controller application must include its own recovery logic, which makes the code more difficult to write and potentially more error-prone. In this paper, we propose a runtime system that automates failure recovery and enables network developers to write simpler, failure-agnostic code. To this end, upon detecting a failure, our approach first spawns a new controller instance that runs in an emulated environment consisting of the network topology excluding the failed elements. Then, it quickly replays inputs observed by the controller before the failure occurred, leading the emulated network into the forwarding state that accounts for the failed elements. Finally, it recovers the network by installing the difference ruleset between emulated and current forwarding states.
KW - Fault tolerance
KW - Software defined network
UR - http://www.scopus.com/inward/record.url?scp=84883688476&partnerID=8YFLogxK
U2 - 10.1145/2491185.2491218
DO - 10.1145/2491185.2491218
M3 - Conference contribution
AN - SCOPUS:84883688476
SN - 9781450320566
T3 - HotSDN 2013 - Proceedings of the 2013 ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking
SP - 159
EP - 160
BT - HotSDN 2013 - Proceedings of the 2013 ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking
T2 - 2013 2nd ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking, HotSDN 2013
Y2 - 16 August 2013 through 16 August 2013
ER -