SAGE: Don't Forget your Inverse DDIM for Image Editing
Guillermo Gomez-Trenado1, Pablo Mesejo1, Óscar Cordón1, Stéphane Lathuilière2
1DaSCI research institute, DECSAI, University of Granada, Granada, Spain
2LTCI, Telécom-Paris, Intitute Polytechnique de Paris, Palaiseau, France
Method teaser

The field of text-to-image generation has undergone significant advancements with the introduction of diffusion models. However, the challenge of editing real images has persisted, with most methods being too computationally intensive or resulting in poor reconstruction. In this paper, we introduce SAGE, a novel technique utilizing pre-trained diffusion models for image editing. The superior performance of SAGE compared to existing approaches is evidenced through both quantitative and qualitative evaluations, complemented by a detailed user study.

Method

SAGE, standing for Self-Attention Guidance for image Editing, builds upon the DDIM algorithm. It incorporates a novel guidance mechanism leveraging the self-attention layers of the diffusion U-Net. This mechanism computes a reconstruction objective by utilizing attention maps generated during the inverse DDIM process. This allows for the efficient reconstruction of unedited regions, eliminating the need for precise reconstruction of the entire input image, thereby addressing the key challenges in image editing.

Method pipeline

Comparison with other methods

Additional results

BibTeX

@article{,
author = {},
title = {},
journal = {},
year = {},
}