PromptNorm: Image Geometry Guides
Ambient Light Normalization
Ramon Baldrich, Maria Vanrell, and Javier Vazquez-Corral
TL;DR: Injecting scene geometry into a transformer-based image restoration model to improve ambient lighting normalization.
Motivation
Similarly to seeing in the dark, understanding the geometric structure of a scene (which remains consistent across lighting conditions and degradations) enhances scene understanding and improves downstream tasks such as ambient lighting normalization.
In this paper, we propose leveraging foundational monocular depth estimators to obtain depth maps and, consequently, surface normals of the input scene. We show that even in challenging lighting scenarios, including multiple light sources or low-light conditions, modern depth estimators can extract reliable geometric information. In particular, we use Depth Anything V2.
Abstract
Ambient lighting normalization is an important computer vision task that aims to remove shadows and standardize illumination across an entire image. While previous approaches have primarily focused on image restoration and frequency-based cues, this paper hypothesizes that incorporating image geometry can significantly improve the normalization process. We propose PromptNorm, a novel transformer-based model that leverages state-of-the-art monocular depth estimators to overcome the challenges posed by strong shadows and extreme color distortions. Our approach uniquely utilizes image normals as a guiding mechanism. We encode these normals to generate a low-level geometric representation, which is then used as Query inputs to weight the attention maps within transformer blocks dynamically. Comprehensive experimental evaluations demonstrate that PromptNorm not only outperforms existing state-of-the-art methods in ambient lighting normalization but also validates the effectiveness of integrating geometric information into image processing techniques. Both quantitative metrics and qualitative assessments confirm the effectiveness of our method.
PromptNorm
PromptNorm is a transformer-based method designed to normalize the lighting from images by leveraging scene geometry. Starting with a monocular depth map estimated from the input image, surface normals are computed and encoded to guide the model’s attention. Both image and normal features are passed through a multi-level encoder-decoder pipeline, where transformer blocks progressively downsample and later upsample the features. The core component is a Geometry-Aware Transformer Block that introduces a Geometry-Guided Transposed Attention (GGTA) mechanism, which injects directional cues from surface normals into the attention queries, allowing the model to focus on geometrically relevant regions. A lightweight Geometry Encoder adjusts normal features across scales, while a Gated Feed-Forward Network (GFFN) refines outputs after each attention block. The final output is generated via a residual connection to preserve image details while eliminating shadows.
Results
PromptNorm achieves state-of-the-art performance on the Ambient6K dataset. We also demonstrate that injecting surface normals into a transformer-based model benefits low-light image enhancement. Please refer to our paper for additional experiments and qualitative results.
Acknowledgements
We acknowledge the FPI grant from Spanish Ministry of Science and Innovation (PRE2022-101525), Departament de Recerca i Universitats with ref. 2021SGR01499 and the CERCA Program from Generalitat de Catalunya, Grant PID2021-128178OB-I00 funded by MCIN/AEI/10.13039/501100011033, ERDF “A way of making Europe”, and the grant Càtedra ENIA UAB-Cruïlla (TSI-100929-2023-2) from the Ministry of Economic Affairs and Digital Transformation of Spain.
Citation
@inproceedings{promptnorm2025,
title={PromptNorm: Image Geometry Guides Ambient Light Normalization},
author={Serrano-Lozano, David and Molina-Bakhos, Francisco A. and Xue, Danna and Yang, Yixiong and Pilligua, Maria and Baldrich, Ramon and Vanrell, Maria and Vazquez-Corral, Javier},
booktitle={CVPRW},
year={2025}
}