Efficient Video Portrait Reenactment via Grid-based Codebook


Kaisiyuan Wang1, Hang Zhou2, Qianyi Wu3, Jiaxiang Tang4, Zhiliang Xu2, Borong Liang2, Tianshu Hu2, Errui Ding2, Jingtuo Liu2, Ziwei Liu5, Jingdong Wang2

1 The University of Sydney   2 Baidu Inc.   3 Monash University   4 Peking University   5 S-Lab, Nanyang Technological University  


Abstract


While progress has been made in the field of portrait reenactment, the problem of how to efficiently produce high-fidelity and accu- rate videos remains. Recent studies build direct mappings between driving signals and their predictions, leading to failure cases when synthesizing background textures and detailed local motions. In this paper, we propose the Video Portrait via Grid-based Codebook (VPGC) framework, which achieves efficient and high-fidelity por- trait modeling. Our key insight is to query driving signals in a position-aware textural codebook with an explicit grid structure. The grid-based codebook stores delicate textural information lo- cally according to our observations on video portraits, which can be learned efficiently and precisely. We subsequently design a Prior- Guided Driving Module to predict reliable features from the driving signals, which can be later decoded back to high-quality video por- traits by querying the codebook. Comprehensive experiments are conducted to validate the effectiveness of our approach.


Method


Our key insight is to learn a personalized grid-based codebook that can facilitite efficient and high-fidelity portrait modeling.

pipeline

Supplementary Video


Our demo video includes self-reenactment and cross-reenactment comparison as well as ablation study examples.


-->

Citation


@inproceedings{10.1145/3588432.3591509,
    author = {Wang, Kaisiyuan and Zhou, Hang and Wu, Qianyi and Tang, Jiaxiang and Xu, Zhiliang and Liang, Borong and Hu, Tianshu and Ding, Errui and Liu, Jingtuo and Liu, Ziwei and Wang, Jingdong},
    title = {Efficient Video Portrait Reenactment via Grid-Based Codebook},
    year = {2023},
    isbn = {9798400701597},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3588432.3591509},
    doi = {10.1145/3588432.3591509},
    abstract = {While progress has been made in the field of portrait reenactment, the problem of how to efficiently produce high-fidelity and accurate videos remains. Recent studies build direct mappings between driving signals and their predictions, leading to failure cases when synthesizing background textures and detailed local motions. In this paper, we propose the Video Portrait via Grid-based Codebook (VPGC) framework, which achieves efficient and high-fidelity portrait modeling. Our key insight is to query driving signals in a position-aware textural codebook with an explicit grid structure. The grid-based codebook stores delicate textural information locally according to our observations on video portraits, which can be learned efficiently and precisely. We subsequently design a Prior-Guided Driving Module to predict reliable features from the driving signals, which can be later decoded back to high-quality video portraits by querying the codebook. Comprehensive experiments are conducted to validate the effectiveness of our approach.},
    booktitle = {ACM SIGGRAPH 2023 Conference Proceedings},
    articleno = {66},
    numpages = {9},
    keywords = {Facial Animation, Video Synthesis},
    location = {Los Angeles, CA, USA},
    series = {SIGGRAPH '23}
}