Kaisiyuan Wang

I am currently an upcoming Ph.D. graduate of the University of Sydney, Electrical and Information Engineering School. I received my B.S. and M.S. degree from Harbin Institute of Technology, Electrical and Engineering School.

Since 2022, I have been a research intern at Department of Computer Vision Technology (VIS), Baidu Inc, working closely with Hang Zhou on both audio/video-driven high-fidelity and efficient Video Portrait synthesis techniques. Previously, I also had a pleasant intern experience in Mobile Intelligence Group (MIG), Sensetime working with Wayne Wu, Qianyi Wu and Xinya Ji on Emotional Talking Face Generation.

Email  /  CV  /  Google Scholar  /  Github

profile photo
Research

My research interests include both human-centric (i.e., 2D/3D Talking Head Synthesis and 3D Human Body Restoration) and object-centric (i.e., Object-Compositional Implicit Scene Reconstruction) topics.

Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition
Jiaxiang Tang, Kaisiyuan Wang, Hang Zhou, Xiaokang Chen, Dongliang He, Tianshu Hu, Jingtuo Liu, Gang Zeng, Jingdong Wang
Arxiv
project page / pdf / code
Objectsdf++: Improved Object-Compositional Neural Implicit Surfaces Objectsdf++: Improved Object-Compositional Neural Implicit Surfaces
Qianyi Wu, Kaisiyuan Wang, Kejie Li Jianmin Zheng Jianfei Cai
ICCV, 2023
project page / pdf / code
StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator
Jiazhi Guan, Zhanwang Zhang, Hang Zhou, Tianshu Hu, Kaisiyuan Wang, Dongliang He, Haocheng Feng, Jingtuo Liu, Errui Ding, Ziwei Liu, Jingdong Wang
CVPR, 2023
project page / pdf / code
Efficient Video Portrait Reenactment via Grid-based Codebook Efficient Video Portrait Reenactment via Grid-based Codebook
Kaisiyuan Wang, Hang Zhou, Qianyi Wu, Jiaxiang Tang, Tianshu Hu, Zhiliang Xu, Borong Liang, Tianshu Hu, Errui Ding, Jingtuo Liu, Ziwei Liu, Jingdong Wang
Siggraph, 2023
project page / pdf / code
Robust Video Portrait Reenactment via Personalized Representation Quantization Robust Video Portrait Reenactment via Personalized Representation Quantization
Kaisiyuan Wang, Changcheng Liang, Hang Zhou, Jiaxiang Tang, Qianyi Wu, Dongliang He, Zhibin Hong, Jingtuo Liu, Errui Ding, Ziwei Liu, Jingdong Wang
AAAI, 2023
project page / pdf / code
VPU: A Video-based Point cloud Upsampling framework VPU: A Video-based Point cloud Upsampling framework
Kaisiyuan Wang, Lu Sheng, Shuhang Gu, Dong Xu,
TIP, 2022
project page / pdf / code
Masked lip-sync prediction by audio-visual contextual exploitation in transformers Masked lip-sync prediction by audio-visual contextual exploitation in transformers
Yasheng Sun*, Hang Zhou*, Kaisiyuan Wang, Qianyi Wu, Zhibin Hong, Jingtuo Liu, Errui Ding, Jingdong Wang, Ziwei Liu, Koike Hideki
Siggraph Asia, 2022
project page / pdf / code
EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model
Xinya Ji, Hang Zhou, Kaisiyuan Wang, Qianyi Wu, Wayne Wu, Feng Xu, Xun Cao
Siggraph, 2022
project page / pdf / code
Audio-Driven Emotional Video Portraits Audio-Driven Emotional Video Portraits
Xinya Ji, Hang Zhou, Kaisiyuan Wang, Qianyi Wu, Wayne Wu, Chen Change Loy, Xun Cao, Feng Xu
CVPR, 2021
project page / pdf / code
MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation
Kaisiyuan Wang, Qianyi Wu, Linsen Song, Zhuoqian Yang, Wayne Wu, Chen Qian, Ran He, Yu Qiao, Chen Change Loy,
ECCV, 2020
project page / pdf / code

Intern Experience
baidu Research Intern for Digital Human | VIS, Baidu Inc.
Beijing, China | Feb. 2022 ~ Now
  • Personalized Video Portrait Reenactment
  • Person-agnostic Audio-driven Talking Head Synthesis
sensetime Research Intern for Digital Human | MIG, Sensetime
Beijing, China | Apr. 2019 ~ Jun. 2020
  • 2D/3D Emotional Facial Expression Generation
  • Audio-driven Emotional Talking Head Synthesis

This website is adapted from Jon Barron's template.

>>>>>>> Stashed changes