Learning to Disentangle Latent Physical Factors for Video Prediction

Deyao Zhu*, Marco Munderloh, Bodo Rosenhahn, Jörg Stückler

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations


Physical scene understanding is a fundamental human ability. Empowering artificial systems with such understanding is an important step towards flexible and adaptive behavior in the real world. As a step in this direction, we propose a novel approach to physical scene understanding in video. We train a deep neural network for video prediction which embeds the video sequence in a low-dimensional recurrent latent space representation. We optimize the total correlation of the latent dimensions within a variational recurrent auto-encoder framework. This encourages the representation to disentangle the latent physical factors of variation in the training data. To train and evaluate our approach, we use synthetic video sequences in three different physical scenarios with various degrees of difficulty. Our experiments demonstrate that our model can disentangle several appearance-related properties in the unsupervised case. If we add supervision signals for the latent code, our model can further improve the disentanglement of dynamics-related properties.

Original languageEnglish (US)
Title of host publicationPattern Recognition - 41st DAGM German Conference, DAGM GCPR 2019, Proceedings
EditorsGernot A. Fink, Simone Frintrop, Xiaoyi Jiang
Number of pages14
ISBN (Print)9783030336752
StatePublished - 2019
Event41st DAGM German Conference on Pattern Recognition, DAGM GCPR 2019 - Dortmund, Germany
Duration: Sep 10 2019Sep 13 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11824 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference41st DAGM German Conference on Pattern Recognition, DAGM GCPR 2019

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science


Dive into the research topics of 'Learning to Disentangle Latent Physical Factors for Video Prediction'. Together they form a unique fingerprint.

Cite this