MotionClone: Training-Free Motion Cloning for Controllable Video Generation

Pengyang Ling^*1,4 Jiazi Bu^*2,4 Pan Zhang^4✝ Xiaoyi Dong⁴ Yuhang Zang⁴ Tong Wu³ Huaian Chen¹ Jiaqi Wang⁴ Yi Jin^1✝
^*Equal Contribution. ^✝Corresponding Author.
¹University of Science and Technology of China ²Shanghai Jiao Tong University ³The Chinese University of Hong Kong ⁴Shanghai AI Laboratory

[Paper] [Github] [BibTeX]

Click to Play the Animations!

Click to Play and Loop Video

Generated with CivitAI model: Realistic Vision V5.1
The first row of videos represents the reference videos, while the second row consists of videos generated by MotionClone.

Methodology

Motion-based controllable text-to-video generation involves motions to control the video generation. Previous methods typically require the training of models to encode motion cues (e.g., VideoComposer) or the fine-tuning of video diffusion models (e.g., Tune-A-Video and VMC). However, these approaches often result in suboptimal motion generation when applied outside the trained domain. In this work, we propose MotionClone, a training-free framework that enables motion cloning from a reference video to control text-to-video generation. We employ temporal attention in video inversion to represent the motions in the reference video and introduce primary temporal-attention guidance to mitigate the influence of noisy or very subtle motions within the attention weights. Furthermore, to assist the generation model in synthesizing reasonable spatial relationships and enhance its prompt-following capability, we propose a location-aware semantic guidance mechanism that leverages the coarse location of the foreground from the reference video and original classifier-free guidance features to guide the video generation.

As illustrated in the framework below, MotionClone comprises two core components in its guidance stage: Primary Temporal-Attention Guidance and Location-Aware Semantic Guidance, which operate synergistically to provide comprehensive motion and semantic guidance for controllable video generation.

Gallery

Here we demonstrate best-quality animations generated by cloning the motion from reference videos using our framework.
Click to play the following animations.

Click to Play and Loop Video

Multiple results generated by MotionClone based on same reference videos:

Click to Play/Pause and Loop Video

BibTeX

 @article{ling2024motionclone,

    title={MotionClone: Training-Free Motion Cloning for Controllable Video Generation},

    author={Ling, Pengyang and Bu, Jiazi and Zhang, Pan and Dong, Xiaoyi and Zang, Yuhang and Wu, Tong and Chen, Huaian and Wang, Jiaqi and Jin, Yi},

    journal={arXiv preprint arXiv:2406.05338},

    year={2024}

  }

Project page template is borrowed from DreamBooth.