VPGC

Efficient Video Portrait Reenactment via Grid-based Codebook

Kaisiyuan Wang¹, Hang Zhou², Qianyi Wu³, Jiaxiang Tang⁴, Zhiliang Xu², Borong Liang², Tianshu Hu², Errui Ding², Jingtuo Liu², Ziwei Liu⁵, Jingdong Wang²

¹ The University of Sydney ² Baidu Inc. ³ Monash University ⁴ Peking University ⁵ S-Lab, Nanyang Technological University

Paper

Code

Abstract

While progress has been made in the field of portrait reenactment, the problem of how to efficiently produce high-fidelity and accu- rate videos remains. Recent studies build direct mappings between driving signals and their predictions, leading to failure cases when synthesizing background textures and detailed local motions. In this paper, we propose the Video Portrait via Grid-based Codebook (VPGC) framework, which achieves efficient and high-fidelity por- trait modeling. Our key insight is to query driving signals in a position-aware textural codebook with an explicit grid structure. The grid-based codebook stores delicate textural information lo- cally according to our observations on video portraits, which can be learned efficiently and precisely. We subsequently design a Prior- Guided Driving Module to predict reliable features from the driving signals, which can be later decoded back to high-quality video por- traits by querying the codebook. Comprehensive experiments are conducted to validate the effectiveness of our approach.

Method

Our key insight is to learn a personalized grid-based codebook that can facilitite efficient and high-fidelity portrait modeling.

Supplementary Video

Our demo video includes self-reenactment and cross-reenactment comparison as well as ablation study examples.

Citation