Light-A-Video: Training-free Video Relighting via Progressive Light Fusion

Yujie Zhou^*1,6 Jiazi Bu^*1,6 Pengyang Ling^*2,6 Pan Zhang^6✝ Tong Wu⁵ Qidong Huang^2,6 Jinsong Li^3,6 Xiaoyi Dong⁶ Yuhang Zang⁶ Yuhang Cao⁶ Anyi Rao⁴ Jiaqi Wang⁶ Li Niu^1✝
^*Equal Contribution. ^✝Corresponding Author.
¹Shanghai Jiao Tong University ²University of Science and Technology of China ³The Chinese University of Hong Kong ⁴Hong Kong University of Science and Technology ⁵Stanford University ⁶Shanghai AI Laboratory

[Paper] [Github] [BibTeX]

Click to Play the Animations!

Click to Play and Loop Video

Generated with AnimateDiff-Motion-Adapter-v1-5-3 with CivitAI model Realistic Vision V5.1 and image relighting model IC-Light.
The first row shows reference videos/foreground sequences, and the second row presents relighted videos generated by Light-A-Video.

Methodology

Recent advancements in image relighting models, driven by large-scale datasets and pre-trained diffusion models, have enabled the imposition of consistent lighting. However, video relighting still lags behind, primarily due to the excessive training costs and the scarcity of diverse, high-quality video relighting datasets. A simple application of image relighting models on a frame-by-frame basis leads to several issues: lighting source inconsistency and relighted appearance inconsistency, resulting in flickers in the generated videos. In this work, we propose Light-A-Video, a training-free approach to achieve temporally smooth video relighting. Adapted from image relighting models, Light-A-Video introduces two key modules to enhance lighting consistency. First, we design a Consistent Light Attention (CLA) module, which enhances cross-frame interactions within the self-attention layers to stabilize the generation of the background lighting source. Second, leveraging the physical principle of light transport independence, we apply linear blending between the source video's appearance and the relighted appearance, using a Progressive Light Fusion (PLF) strategy to ensure smooth temporal transitions in illumination. Experiments show that Light-A-Video improves the temporal consistency of relighted video while maintaining the image quality, ensuring coherent lighting transitions across frames.

As illustrated in the framework below, a source video is first noised and processed through the VDM for denoising across \( T_m \) steps. At each step, the predicted noise-free component with details compensation serves as the Consistent Target \( \mathbf{z}^{v}_{0 \gets t} \), inherently representing the VDM's denoising direction. Consistent Light Attention infuses \( \mathbf{z}^{v}_{0 \gets t} \) with unique lighting information, transforming it into the Relight Target \( \mathbf{z}^{r}_{0 \gets t} \). The Progressive Light Fusion strategy then merges two targets to form the Fusion Target \( \tilde{\mathbf{z}}_{0 \gets t} \), which provides a refined direction for the current step. The bottom-right part shows the iterative evolution of \( \mathbf{z}^{v}_{0 \gets t} \).

Full Video Relighting Gallery

Here we demonstrate our results for full video relighting.
Click to play the following animations.

Click to Play and Loop Video

Video Foreground Relighting Gallery

Here we demonstrate our results for video foreground relighting with background generation.
Click to play the following animations.

Click to Play and Loop Video

Light-A-Video on More Advanced T2V Backbone

Here we demonstrate our results on an advanced DiT-based T2V backbone CogVideoX-2B.
Click to play the following animations.

Click to Play and Loop Video

BibTeX

 @article{zhou2025light,

    title={Light-A-Video: Training-free Video Relighting via Progressive Light Fusion},

     author={Zhou, Yujie and Bu, Jiazi and Ling, Pengyang and Zhang, Pan and Wu, Tong and Huang, Qidong and Li, Jinsong and Dong, Xiaoyi and Zang, Yuhang and Cao, Yuhang and others},

    journal={arXiv preprint arXiv:2502.08590},

    year={2025}

  }

Project page template is borrowed from DreamBooth.