Generated with
AnimateDiff-Motion-Adapter-v1-5-3
with CivitAI model
Realistic Vision V5.1
and image relighting model
IC-Light.
The first row shows reference videos/foreground sequences, and the second row presents relighted videos generated by Light-A-Video.
Methodology
Recent advancements in image relighting models,
driven by large-scale datasets and pre-trained diffusion models,
have enabled the imposition of consistent lighting.
However, video relighting still lags behind,
primarily due to the excessive training costs and the
scarcity of diverse, high-quality video relighting datasets.
A simple application of image relighting models on a frame-by-frame basis leads to several issues:
lighting source inconsistency and relighted appearance inconsistency,
resulting in flickers in the generated videos.
In this work, we propose Light-A-Video,
a training-free approach to achieve temporally smooth video relighting.
Adapted from image relighting models,
Light-A-Video introduces two key modules to enhance lighting consistency.
First, we design a Consistent Light Attention
(CLA) module,
which enhances cross-frame interactions within the self-attention layers
to stabilize the generation of the background lighting source.
Second, leveraging the physical principle of light transport independence,
we apply linear blending between the source video's appearance
and the relighted appearance,
using a Progressive Light Fusion (PLF) strategy to ensure
smooth temporal transitions in illumination.
Experiments show that Light-A-Video improves the temporal consistency of relighted video
while maintaining the image quality,
ensuring coherent lighting transitions across frames.
As illustrated in the framework below, a source video is first noised and processed through the VDM for denoising across \( T_m \) steps.
At each step, the predicted noise-free component with details compensation serves as the Consistent Target \( \mathbf{z}^{v}_{0 \gets t} \),
inherently representing the VDM's denoising direction.
Consistent Light Attention infuses \( \mathbf{z}^{v}_{0 \gets t} \) with unique lighting information, transforming it into the Relight Target \( \mathbf{z}^{r}_{0 \gets t} \).
The Progressive Light Fusion strategy then merges two targets to form the Fusion Target \( \tilde{\mathbf{z}}_{0 \gets t} \),
which provides a refined direction for the current step.
The bottom-right part shows the iterative evolution of \( \mathbf{z}^{v}_{0 \gets t} \).
Full Video Relighting Gallery
Here we demonstrate our results for full video relighting. Click to play the following animations.
Click to Play and Loop Video
Click to Play and Loop Video
Video Foreground Relighting Gallery
Here we demonstrate our results for video foreground relighting with background generation. Click to play the following animations.
Click to Play and Loop Video
Click to Play and Loop Video
Light-A-Video on More Advanced T2V Backbone
Here we demonstrate our results on an advanced DiT-based T2V backbone
CogVideoX-2B. Click to play the following animations.
Click to Play and Loop Video
BibTeX
@article{zhou2025light,
title={Light-A-Video: Training-free Video Relighting via Progressive Light Fusion},
author={Zhou, Yujie and Bu, Jiazi and Ling, Pengyang and Zhang, Pan and Wu, Tong and Huang, Qidong and Li, Jinsong and Dong, Xiaoyi and Zang, Yuhang and Cao, Yuhang and others},
journal={arXiv preprint arXiv:2502.08590},
year={2025}
}
Project page template is borrowed from DreamBooth.