We propose Neural Photo-Finishing, an end-to-end differentiable pipeline for rendering sRGB images from raw inputs controlled by meaningful parameters. We accurately model a commercial raw processing pipeline (Adobe Camera Raw) with a sequence of neural networks, enabling partial derivatives to be evaluated anywhere in the pipeline with respect to any input or upstream parameter.
* joint authors with equal contribution between Princeton and Adobe.
Image processing pipelines are ubiquitous and we rely on them either directly, by filtering or adjusting an image post-capture, or indirectly, as image signal processing (ISP) pipelines on broadly deployed camera systems. Used by artists, photographers, system engineers, and for downstream vision tasks, traditional image processing pipelines feature complex algorithmic branches developed over decades. Recently, image-to-image networks have made great strides in image processing, style transfer, and semantic understanding. The differentiable nature of these networks allows them to fit a large corpus of data; however, they do not allow for intuitive, fine-grained controls that photographers find in modern photo-finishing tools. This work closes that gap and presents an approach to making complex photo-finishing pipelines differentiable, allowing legacy algorithms to be trained akin to neural networks using first-order optimization methods. By concatenating tailored network proxy models of individual processing steps (e.g. white-balance, tone-mapping, color tuning), we can model a non-differentiable reference image finishing pipeline more faithfully than existing proxy image-to-image network models. We validate the method for several diverse applications, including photo and video style transfer, slider regression for commercial camera ISPs, photography-driven neural demosaicking, and adversarial photo-editing.
By concatenating tailored network proxy models of individual processing steps (e.g. white-balance, tone-mapping, color tuning), we can model a non-differentiable reference image finishing pipeline more faithfully than existing proxy image-to-image network models. We validate the method for several diverse applications, including photo and video style transfer, slider regression for commercial camera ISPs, photography-driven neural demosaicking, and adversarial photo-editing.
Every operation in the ACR pipeline is a complex function of the input image and its slider controls. Due to the large slider space and the pipeline’s sequential topology, small changes early in the pipeline can lead to dramatic changes in the finished image. We tackle the problems of vanishing samples and cached statistics by decomposing the training of a differentiable architecture into a sequence of approximators. Each approximator represents a single pipeline block and is conditioned on input statistics.
Pipeline Proxy Functions
We divide operations into three categories: neural pointwise, neural areawise, and differentiable programs. Pointwise operators consist of operations that affect the image at a per-pixel level without consideration of neighboring pixels. Areawise operators are nonlinear filters that depend on a pixel’s neighbors. Differentiable programs are operations that are defined by the DNG specification or by standard definitions such as the Planckian locus and can therefore be explicitly written as a differentiable program.
We validate the accuracy of our proxies through a slider regression experiment. This experiment confirms that the gradient flow through our composite pipeline is useful and allows us to deduce the slider settings used to finish a photograph.
Raw Style Transfer
We perform photo-finishing style transfer, where we train an encoder to dynamically predict ACR slider settings conditioned on an input image. Similar to the “Auto Adjust” or “Magic Wand” feature on many commercial photo-editing applications, our network predicts sliders based on the input image content. However, unlike existing “one-size-fits-all” automatic adjustment algorithms, we can cater our network to different styles by training it on different target style collections. Raw images are scaled for display.
 Ethan Tseng, Felix Yu, Yuting Yang, Fahim Mannan, Karl St-Arnaud, Derek Nowrouzezahrai, Jean-François Lalonde, and Felix Heide. Hyperparameter optimization in black-box image processing using differentiable proxies. ACM Transactions on Graphics (SIGGRAPH), 2019
 Ali Mosleh, Avinash Sharma, Emmanuel Onzon, Fahim Mannan, Nicolas Robidoux, and Felix Heide. Hardware-in-the-loop End-to-end Optimization of Camera Image Processing Pipelines. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2020
 Yuxuan Zhang, Bo Dong, and Felix Heide. All You Need is RAW: Defending Against Adversarial Attacks with Camera Image Pipelines. In European Conference on Computer Vision (ECCV), 2022
 Steven Diamond, Vincent Sitzmann, Frank Julca-Aguilar, Stephen Boyd, Gordon Wetzstein, Felix Heide. Dirty Pixels: Towards End-to-End Image Processing and Perception. ACM Transactions on Graphics (SIGGRAPH) 2021
 Buu Phan, Fahim Mannan, Felix Heide. Adversarial Imaging Pipelines. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021