The field of text-to-image generation has undergone significant advancements with the introduction of diffusion models. However, the challenge of editing real images has persisted, with most methods being too computationally intensive or resulting in poor reconstruction. In this paper, we introduce SAGE, a novel technique utilizing pre-trained diffusion models for image editing. The superior performance of SAGE compared to existing approaches is evidenced through both quantitative and qualitative evaluations, complemented by a detailed user study.
Method
SAGE, standing for Self-Attention Guidance for image Editing, builds upon the DDIM algorithm. It incorporates a novel guidance mechanism leveraging the self-attention layers of the diffusion U-Net. This mechanism computes a reconstruction objective by utilizing attention maps generated during the inverse DDIM process. This allows for the efficient reconstruction of unedited regions, eliminating the need for precise reconstruction of the entire input image, thereby addressing the key challenges in image editing.
Comparison with other methods
Additional results
BibTeX
@article{,
author = {},
title = {},
journal = {},
year = {},
}