🌊HiFlow: Training-free High-Resolution Image Generation
with Flow-Aligned Guidance

Jiazi Bu^1,5* Pengyang Ling^2,5* Yujie Zhou^1,5* Pan Zhang^5† Tong Wu⁴

Xiaoyi Dong^3,5 Yuhang Zang⁵ Yuhang Cao⁵ Dahua Lin^3,5,7 Jiaqi Wang^5,6†

¹ Shanghai Jiao Tong University    ² University of Science and Technology of China
³ The Chinese University of Hong Kong    ⁴ Stanford University    ⁵ Shanghai AI Laboratory
⁶ Shanghai Innovation Institute    ⁷ CPII under InnoHK
(^* Equal Contribution ^† Corresponding Author)

[Paper] [Code]

Abstract

Text-to-image (T2I) diffusion/flow models have drawn considerable attention recently due to their remarkable ability to deliver flexible visual creations. Still, high-resolution image synthesis presents formidable challenges due to the scarcity and complexity of high-resolution content. To this end, we present HiFlow, a training-free and model-agnostic framework to unlock the resolution potential of pre-trained flow models. Specifically, HiFlow establishes a virtual reference flow within the high-resolution space that effectively captures the characteristics of low-resolution flow information, offering guidance for high-resolution generation through three key aspects: initialization alignment for low-frequency consistency, direction alignment for structure preservation, and acceleration alignment for detail fidelity. By leveraging this flow-aligned guidance, HiFlow substantially elevates the quality of high-resolution image synthesis of T2I models and demonstrates versatility across their personalized variants. Extensive experiments validate HiFlow's superiority in achieving superior high-resolution image quality over current state-of-the-art methods.

Methodology

HiFlow constructs reference flow from low-resolution sampling trajectory to offer initiation alignment, direction alignment, and acceleration alignment, enabling flow-aligned high-resolution image generation. Specifically, HiFlow involves a cascade generation paradigm: First, a virtual reference flow is constructed in the high-resolution space based on the step-wise estimated clean samples of the low-resolution sampling flow. Then, during high-resolution synthesizing, the reference flow offers guidance from sampling initialization, denoising direction, and moving acceleration, aiding in achieving consistent low-frequency patterns, preserving structural features, and maintaining high-fidelity details. Such flow-aligned guidance from the sampling trajectory facilitates better merging of the structure synthesized at the low-resolution scale and the details synthesized at the high-resolution scale, enabling superior generation.

Qualitative Comparison

Image qualitative comparisons with other baselines. HiFlow yields high-resolution images characterized by high-fidelity details and coherent structure.

Image qualitative comparisons with training-based methods. HiFlow demonstrates the capability to generate high-resolution images with quality comparable to leading training-based models (UltraPixel, DALL·E 3, and Flux-Pro).

Versatile Applications of HiFlow

Here we demonstrate our results for more applications including LoRA, ControlNet, and Quantization. Additionally, we showcase our results on SDXL, a U-Net based T2I diffusion model.

BibTex

If you find this work helpful, please cite the following paper:

  @article{bu2025hiflow,
    title={HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance},
    author={Bu, Jiazi and Ling, Pengyang and Zhou, Yujie and Zhang, Pan and Wu, Tong and Dong, Xiaoyi and Zang, Yuhang and Cao, Yuhang and Lin, Dahua and Wang, Jiaqi},
    journal={arXiv preprint arXiv:2504.06232},
    year={2025}
  }