Home
Publication
Team๐ฅ
Seminar
Opening
Contact
Github
Home
Publication
Team๐ฅ
Seminar
Opening
Contact
Github
Full list can be founded at
[Google Scholar]
(๐งโ๐ป Co-first Author, ๐ฎ Corresponding Author)
Preprint
50. EmoCAST: Emotional Talking Portrait via Emotive Text Description
Yiguo Jiang,
Xiaodong Cun
๐ฎ, Yong Zhang, Yudian Zheng, Fan Tang, Chi-Man Pun๐ฎ
Preprint, 2025.
49. VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning
Liyun Zhu, Qixiang Chen, Xi Shen,
Xiaodong Cun
๐ฎ
Preprint, 2025.
48. Sci-Fi: Symmetric Constraint for Frame Inbetweening
Liuhan Chen,
Xiaodong Cun
๐ฎ, Xiaoyu Li, Xianyi He, Shenghai Yuan, Jie Chen, Ying Shan, Li Yuan๐ฎ
Preprint, 2025.
47. GSFixer: Improving 3D Gaussian Splatting with Reference-Guided Video Diffusion Priors
Xingyilang Yin, Qi Zhang, Jiahao Chang, Ying Feng, Qingnan Fan, Xi Yang, Chi-Man Pun๐ฎ, Huaqi Zhang,
Xiaodong Cun
๐ฎ
Preprint, 2025.
46. ๐งโโ๏ธ FairyGen: Storied Cartoon Video from a Single Child-Drawn Character
Jiayi Zheng,
Xiaodong Cun
๐ฎ
Preprint, 2025.
45. AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
Jiwen Yu,
Xiaodong Cun ๐ฎ
, Chenyang Qi, Yong Zhang,
Xintao Wang
, Ying Shan, Jian Zhang
Preprint, 2024.
44. Learning Enriched Illuminants for Cross and Single Sensor Color Constancy
Xiaodong Cun ๐งโ๐ป
, Zhendong Wang ๐งโ๐ป,
Chi-Man Pun
, Wengang Zhou, Jianzhuang Liu, Xu Jia, Houqiang Li
Preprint, 2021.
Year 2025
43. AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature Reuse.
Zichao Yu, Zhen Zou, Guojiang Shao, Chengwei Zhang, Shengze Xu, Jie Huang, Feng Zhao,
Xiaodong Cun
, Wenyi Zhang.
ACM MultiMedia
, 2025.
42. Mobius: Text to Seamless Looping Video Generation via Latent Shift
Xiuli Bi, Jianfei Yuan, Bo Liu,
Yong Zhang
,
Xiaodong Cun ๐ฎ
, Chi-Man Pun, Bin Xiao
SIGGRAPH
(Conference Track), 2025.
41. DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
Minghong Cai,
Xiaodong Cun
, Xiaoyu Li, Wenze Liu, Zhaoyang Zhang, Yong Zhang, Ying Shan, Xiangyu Yue
Computer Vision and Pattern Recognition (
CVPR
), 2025.
40. DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos.
Wenbo Hu, Xiangjun Gao, Xiaoyu Li, Sijie Zhao,
Xiaodong Cun
, Yong Zhang, Long Quan, Ying Shan.
Computer Vision and Pattern Recognition (
CVPR Highlight
), 2025.
Best Paper at PixFoundation workshop of CVPR 25.
39. DEIM: DETR with Improved Matching for Fast Convergence.
hihua Huang, Zhichao Lu,
Xiaodong Cun
, Yongjun Yu, Xiao Zhou, Xi Shen.
Computer Vision and Pattern Recognition (
CVPR
), 2025.
38. CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training
Xiuli Bi, Jian Lu, Bo Liu,
Xiaodong Cun ๐ฎ
,
Yong Zhang
, Weisheng Li, Bin Xiao
AAAI Conference on Artificial Intelligence (
AAAI
), 2025.
37. MagicStick๐ช: Controllable Video Editing via Control Handle Transformations
Yue Ma
,
Xiaodong Cun ๐ฎ
,
Yingqing He
,
Chenyang Qi
,
Xintao Wang
,
Ying Shan
,
Xiu Li
,
Qifeng Chen ๐ฎ
IEEE Workshop on Applications of Computer Vision (
WACV
), 2025.
Year 2024
36. CV-VAE: A Compatible Video VAE for Latent Generative Video Models.
Sijie Zhao, Yong Zhang,
Xiaodong Cun
Shaoshu Yang, Muyao Niu, Xiaoyu Li, Wenbo Hu, Ying Shan.
NeurIPS
, 2024.
35. Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance
Jinbo Xing, Menghan Xia, Yuxin Liu, Yuechen Zhang, Yong Zhang, Yingqing He, Hanyuan Liu, Haoxin Chen,
Xiaodong Cun
, Xintao Wang, Ying Shan, Tien-Tsin Wong
IEEE Transactions on Visualization and Computer Graphic (
TVCG
), 2024.
34. Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models.
Qinyu Yang, Haoxin Chen, Yong Zhang, Menghan Xia,
Xiaodong Cun
, Zhixun Su, Ying Shan.
European Conference on Computer Vision (
ECCV
), 2024.
33. MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
Muyao Niu
,
Xiaodong Cun ๐ฎ
,
Xintao Wang
,
Yong Zhang
,
Ying Shan
,
Yinqiang Zheng ๐ฎ
European Conference on Computer Vision (
ECCV
), 2024.
32. Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation.
Lanqing Guo, Yingqing He, Haoxin Chen, Menghan Xia,
Xiaodong Cun
, Yufei Wang, Siyu Huang, Yong Zhang, Xintao Wang, Qifeng Chen, Ying Shan, Bihan Wen.
European Conference on Computer Vision (
ECCV
), 2024.
31. VideoCrafter1 & VideoCrafter2
Haoxin Chen ๐งโ๐ป
, Menghan Xia ๐งโ๐ป,
Yong Zhang ๐งโ๐ป
,
Xiaodong Cun๐งโ๐ป
,
Xintao Wang
,
Ying Shan
Computer Vision and Pattern Recognition (
CVPR
) & Technical report, 2024.
PaperDigest Most Influential Papers of ArXiv 24 (
paperdigest.org
).
30. EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
Yaofang Liu๐งโ๐ป,
Xiaodong Cun๐งโ๐ป
,
Xuebo Liu
,
Xintao Wang
,
Yong Zhang
,
Haoxin Chen
, Yang Liu,
Tieyong Zeng
,
Raymond Chan
,
Ying Shan
Computer Vision and Pattern Recognition (
CVPR
), 2024.
29. SmartEdit: Exploring Complex Instruction-based Image Editing with Large Language Models.
Yuzhou Huang, Liangbin Xie, Xintao Wang, Ziyang Yuan,
Xiaodong Cun
, Yixiao Ge, Jiantao Zhou, Chao Dong, Rui Huang, Ruimao Zhang, Ying Shan.
Computer Vision and Pattern Recognition (
CVPR Highlight
), 2024.
28. Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework.
Ziyao Huang, Fan Tang, Yong Zhang,
Xiaodong Cun
, Juan Cao, Jintao Li, Tong-yee Lee.
Computer Vision and Pattern Recognition (
CVPR
), 2024.
27. Depth-aware Test-Time Training for Zero-shot Video Object Segmentation. CVPR 2024.
Weihuang Liu, Xi Shen, Haolun Li, Xiuli Bi, Bo Liu, Chi-Man Pun๐ฎ,
Xiaodong Cun๐ฎ
.
Computer Vision and Pattern Recognition (
CVPR
), 2024.
26. X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
Lingmin Ran,
Xiaodong Cun
, Jia-Wei Liu, Rui Zhao, Song Zijie,
Xintao Wang
, Jussi Keppo,
Mike Zheng Shou
Computer Vision and Pattern Recognition (
CVPR
), 2024.
25. Sketch Video Synthesis
Yudian Zheng
,
Xiaodong Cun ๐ฎ
,
Menghan Xia
,
Chi-Man Pun
Eurographics
, 2024.
24. ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models
Yingqing He, Shaoshu Yang,
Haoxin Chen
,
Xiaodong Cun
,
Menghan Xia
,
Yong Zhang
,
Xintao Wang
, Ran He,
Qifeng Chen
,
Ying Shan
ICLR (Spotlight)
, 2024.
23. Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos
Yue Ma ๐งโ๐ป
,
Yingqing He ๐งโ๐ป
,
Xiaodong Cun
,
Xintao Wang
,
Ying Shan
,
Xiu Li
,
Qifeng Chen
AAAI Conference on Artificial Intelligence (
AAAI
), 2024.
PaperDigest Most Influential Papers of AAAI 24 (
paperdigest.org
).
Year 2023
22. TaleCrafter: Interactive Story Visualization with Multiple Characters
Yuan Gong
,
Youxin Pang
,
Xiaodong Cun
,
Menghan Xia
, Yingqing He, Haoxin Chen,
Longyue Wang
,
Yong Zhang
,
Xintao Wang
,
Ying Shan
and Yujiu Yang
SIGGRAPH Asia
(Conference Track), 2023.
21. Inserting Anybody in Diffusion Models via Celeb Basis
Ge Yuan
,
Xiaodong Cun
,
Yong Zhang
,
Maomao Li
,
Chenyang Qi
,
Xintao Wang
,
Ying Shan
,
Huicheng Zheng
NeurIPS
, 2023.
20. FateZero: Fusing Attentions for Zero-shot Text-based Video Editing
Chenyang Qi
,
Xiaodong Cun ๐ฎ
,
Yong Zhang
,
Chenyang Lei
,
Xintao Wang
,
Ying Shan
,
Qifeng Chen ๐ฎ
International Conference on Computer Vision (
ICCV
), 2023.
ICCV 23 Oral Presentation (2.3%)
PaperDigest Most Influential Papers of ICCV 23 (
paperdigest.org
).
19. ToonTalker: Cross-Domain Face Reenactment
Yuan Gong
,
Yong Zhang
,
Xiaodong Cun
,
Fei Yin
,
Yanbo Fan
,
Xuan Wang
, Baoyuan Wu, and Yujiu Yang
International Conference on Computer Vision (
ICCV
), 2023.
18. LivelySpeaker: Towards Semantic-aware Co-Speech Gesture Generation
Yihao Zhi๐งโ๐ป
,
Xiaodong Cun๐งโ๐ป
,
Xuelin Chen
,
Xi Shen
,
Wen Guo
,
Shaoli Huang
,
Shenghua Gao
International Conference on Computer Vision (
ICCV
), 2023.
17. High-Resolution Document Shadow Removal via A Large-Scale Real-World Dataset and A Frequency-Aware Shadow Erasing Net
Zinuo Li ๐งโ๐ป
,
Xuhang Chen ๐งโ๐ป
,
Chi-Man Pun ๐ฎ
,
Xiaodong Cun ๐ฎ
International Conference on Computer Vision (
ICCV
), 2023.
16. Explicit Visual Prompting for Low-Level Structure Segmentations
Weihuang Liu,
Xi Shen
,
Chi-Man Pun ๐ฎ
,
Xiaodong Cun ๐ฎ
Computer Vision and Pattern Recognition (
CVPR
) & Journal Submission, 2023.
15. T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations
Jianrong Zhang ๐งโ๐ป, Yangsong Zhang ๐งโ๐ป,
Xiaodong Cun
,
Shaoli Huang
,
Yong Zhang
, Hongwei Zhao, Hongtao Lu,
Xi Shen ๐ฎ
Computer Vision and Pattern Recognition (
CVPR
), 2023.
14. 3D GAN Inversion with Facial Symmetry Prior
Fei Yin
,
Yong Zhang
,
Xuan Wang
, Tengfei Wang,
Xiaoyu Li
, Yuan Gong,
Yanbo Fan
,
Xiaodong Cun
, Ying Shan, Cengiz Oztireli, Yujiu Yang
Computer Vision and Pattern Recognition (
CVPR
), 2023.
13. DPE: Disentanglement of Pose and Expression for General Video Portrait Editing
Youxin Pang,
Yong Zhang
, Weize Quan,
Yanbo Fan
,
Xiaodong Cun
, Ying Shan, Dong-ming Yan
Computer Vision and Pattern Recognition (
CVPR
), 2023.
12. SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
Wenxuan Zhang ๐งโ๐ป,
Xiaodong Cun ๐งโ๐ป
,
Xuan Wang
,
Yong Zhang
,
Xi Shen
, Yu Guo, Ying Shan, Fei Wang
Computer Vision and Pattern Recognition (
CVPR
), 2023.
Top 10 you won't miss paper of CVPR 2023 (
voxel51.com
).
Top 10 Most Github Star CVPR paper (
github.com
).
11. CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
Jinbo Xing
,
Menghan Xia
,
Yuechen Zhang
,
Xiaodong Cun
,
Jue Wang
,
Tien-Tsin Wong
Computer Vision and Pattern Recognition (
CVPR
), 2023.
10. CoordFill: Efficient High-Resolution Image Inpainting via Parameterized Coordinate Querying
Weihuang Liu,
Xiaodong Cun ๐ฎ
,
Chi-Man Pun ๐ฎ
,
Menghan Xia
,
Yong Zhang
, and
Jue Wang
AAAI Conference on Artificial Intelligence (
AAAI, Oral
), 2023.
Year 2022
9. VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
Kun Cheng ๐งโ๐ป,
Xiaodong Cun ๐งโ๐ป๐ฎ
,
Yong Zhang
,
Menghan Xia
,
Fei Yin
, Mingrui Zhu,
Xuan Wang
,
Jue Wang
, Nannan Wang
SIGGRAPH Asia
(Conference Track), 2022.
Top 10 Most Github Star SIGGRAPH paper (
github.com
).
8. StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pretrained StyleGAN
Fei Yin
,
Yong Zhang
,
Xiaodong Cun
, Mingdeng Cao,
Yanbo Fan
,
Xuan Wang
, Qingyan Bai, Baoyuan Wu,
Jue Wang
, Yujiu Yang
European Conference on Computer Vision (
ECCV
), 2022.
7. Spatial-Separated Curve Rendering Network for Efficient and High-Resolution Image Harmonization
Jingtang Liang ๐งโ๐ป ,
Xiaodong Cun ๐งโ๐ป
,
Chi-Man Pun
,
Jue Wang
European Conference on Computer Vision (
ECCV
), 2022.
6. Uformer: A General U-Shaped Transformer for Image Restoration
Zhendong Wang
,
Xiaodong Cun ๐ฎ
,
Jianmin Bao
,
Jianzhuang Liu
,
Wengang Zhou
,
Houqiang Li
Computer Vision and Pattern Recognition (
CVPR
), 2022.
PaperDigest Most Influential Papers of CVPR 22 (
paperdigest.org
).
Before 2021
5. Split then Refine: Sequential Attention-guided ResUNets for Blind Single Image Visible Watermark Removal
Xiaodong Cun
,
Chi-Man Pun
AAAI Conference on Artificial Intelligence (
AAAI
), 2021.
4. Defocus Blur Detection via Depth Distillation
Xiaodong Cun
,
Chi-Man Pun
European Conference on Computer Vision (
ECCV
), 2020.
3. Improving the Harmony of the Composite Image by Spatial-Separated Attention Module
Xiaodong Cun
,
Chi-Man Pun
IEEE Trans. on Image Processing (
TIP
), 2020.
2. Towards Ghost-free Shadow Removal via Dual Hierarchical Aggregation Network and Shadow Matting GAN
Xiaodong Cun
,
Chi-Man Pun
, Cheng Shi
AAAI Conference on Artificial Intelligence (
AAAI
), 2020.
1. Depth Assisted Full Resolution Network for Single Image based View Synthesis
Xiaodong Cun
,
Feng Xu
,
Chi-Man Pun
,
Hao Gao
SIGGRAPH Poster
, 2018.