Generated with CivitAI model:
Realistic Vision V5.1
The first row of videos represents the reference videos, while the second row consists of videos generated by MotionClone.
Methodology
Motion-based controllable text-to-video generation involves motions to control the video generation. Previous methods typically require the training of models to
encode motion cues (e.g., VideoComposer) or the fine-tuning of video diffusion models (e.g., Tune-A-Video and VMC). However, these approaches often result in suboptimal motion generation when applied outside the trained domain.
In this work, we propose MotionClone, a training-free framework that enables motion cloning from a reference video to control text-to-video generation.
We employ temporal attention in video inversion to represent the motions in the reference video and introduce primary temporal-attention guidance to mitigate the influence of noisy or very subtle motions within the attention weights.
Furthermore, to assist the generation model in synthesizing reasonable spatial relationships and enhance its prompt-following capability, we propose a location-aware semantic guidance mechanism that leverages the coarse location of the
foreground from the reference video and original classifier-free guidance features to guide the video generation.
As illustrated in the framework below, MotionClone comprises two core components in its guidance stage: Primary
Temporal-Attention Guidance and Location-Aware Semantic Guidance, which operate synergistically
to provide comprehensive motion and semantic guidance for controllable video generation.
Gallery
Here we demonstrate best-quality animations generated by cloning the motion from reference videos using our framework. Click to play the following animations.
Click to Play and Loop Video
Click to Play and Loop Video
Click to Play and Loop Video
Click to Play and Loop Video
Multiple results generated by MotionClone based on same reference videos:
Click to Play/Pause and Loop Video
BibTeX
@article{ling2024motionclone,
title={MotionClone: Training-Free Motion Cloning for Controllable Video Generation},
author={Ling, Pengyang and Bu, Jiazi and Zhang, Pan and Dong, Xiaoyi and Zang, Yuhang and Wu, Tong and Chen, Huaian and Wang, Jiaqi and Jin, Yi},
journal={arXiv preprint arXiv:2406.05338},
year={2024}
}
Project page template is borrowed from DreamBooth.