TY - GEN
T1 - CLIP2StyleGAN: Unsupervised Extraction of StyleGAN Edit Directions
AU - Abdal, Rameen
AU - Zhu, Peihao
AU - Femiani, John
AU - Mitra, Niloy
AU - Wonka, Peter
N1 - KAUST Repository Item: Exported on 2023-03-24
Acknowledgements: We would like to thank Visual Computing Center (VCC), KAUST for the support, gifts from Adobe Research, and the UCL AI Centre. We would also like to thank OpenAI for the CLIP model.
PY - 2022/7/24
Y1 - 2022/7/24
N2 - The success of StyleGAN has enabled unprecedented semantic editing capabilities, on both synthesized and real images. However, such editing operations are either trained with semantic supervision or annotated manually by users. In another development, the CLIP architecture has been trained with internet-scale loose image and text pairings, and has been shown to be useful in several zero-shot learning settings. In this work, we investigate how to effectively link the pretrained latent spaces of StyleGAN and CLIP, which in turn allows us to automatically extract semantically-labeled edit directions from StyleGAN, finding and naming meaningful edit operations, in a fully unsupervised setup, without additional human guidance. Technically, we propose two novel building blocks; one for discovering interesting CLIP directions and one for semantically labeling arbitrary directions in CLIP latent space. The setup does not assume any pre-determined labels and hence we do not require any additional supervised text/attributes to build the editing framework. We evaluate the effectiveness of the proposed method and demonstrate that extraction of disentangled labeled StyleGAN edit directions is indeed possible, revealing interesting and non-trivial edit directions.
AB - The success of StyleGAN has enabled unprecedented semantic editing capabilities, on both synthesized and real images. However, such editing operations are either trained with semantic supervision or annotated manually by users. In another development, the CLIP architecture has been trained with internet-scale loose image and text pairings, and has been shown to be useful in several zero-shot learning settings. In this work, we investigate how to effectively link the pretrained latent spaces of StyleGAN and CLIP, which in turn allows us to automatically extract semantically-labeled edit directions from StyleGAN, finding and naming meaningful edit operations, in a fully unsupervised setup, without additional human guidance. Technically, we propose two novel building blocks; one for discovering interesting CLIP directions and one for semantically labeling arbitrary directions in CLIP latent space. The setup does not assume any pre-determined labels and hence we do not require any additional supervised text/attributes to build the editing framework. We evaluate the effectiveness of the proposed method and demonstrate that extraction of disentangled labeled StyleGAN edit directions is indeed possible, revealing interesting and non-trivial edit directions.
UR - http://hdl.handle.net/10754/674022
UR - https://dl.acm.org/doi/10.1145/3528233.3530747
U2 - 10.1145/3528233.3530747
DO - 10.1145/3528233.3530747
M3 - Conference contribution
BT - Special Interest Group on Computer Graphics and Interactive Techniques Conference Proceedings
PB - ACM
ER -