Hyperparameter Optimization in Black-box Image Processing using Differentiable Proxies

We present a fully automatic approach to optimize black-box imaging systems using stochastic first-order optimization.

Nearly every commodity imaging system we directly interact with, or indirectly rely on, leverages power efficient, application-adjustable black-box hardware image signal processing (ISPs) units, running either in dedicated hardware blocks, or as proprietary software modules on programmable hardware. The configuration parameters of these black-box ISPs often have complex interactions with the output image, and must be adjusted prior to deployment according to application-specific quality and performance metrics. Today, this search is commonly performed manually by "golden eye" experts or algorithm developers leveraging domain expertise. We present a fully automatic system to optimize the parameters of black-box hardware and software image processing pipelines according to any arbitrary (i.e., application-specific) metric. We leverage a differentiable mapping between the configuration space and evaluation metrics, parameterized by a convolutional neural network that we train in an end-to-end fashion with imaging hardware in-the-loop. Unlike prior art, our differentiable proxies allow for high-dimension parameter search with stochastic first-order optimizers, without explicitly modeling any lower-level image processing transformations. As such, we can efficiently optimize black-box image processing pipelines for a variety of imaging applications, reducing application-specific configuration times from months to hours. Our optimization method is fully automatic, even with black-box hardware in the loop. We validate our method on experimental data for real-time display applications, object detection, and extreme low-light imaging. The proposed approach outperforms manual search qualitatively and quantitatively for all domain-specific applications tested. When applied to traditional denoisers, we demonstrate that—just by changing hyperparameters—traditional algorithms can outperform recent deep learning methods by a substantial margin on recent benchmarks.


Ethan Tseng, Felix Yu, Yuting Yang, Fahim Mannan, Karl St. Arnaud, Derek Nowrouzezahrai, Jean-François Lalonde, Felix Heide

Hyperparameter Optimization in Black-box Image Processing using Differentiable Proxies


Selected Results

Proxy Output

Target Image

Proxy Output

Target Image

Target Image

Hyper-parameter optimization using differentiable proxy.

The first step of our optimization method is to train a differentiable proxy model to mimic an arbitrary black-box ISP. After that is done, our second step is to use first order stochastic optimization to search for a set of hyper-parameters that cause the ISP to produce the desired target image. The two videos above are time lapses of the second step. In these videos, a proxy trained to mimic the ARM MALI-C71 ISP starts from randomly initialized hyper-parameters and eventually converges on hyper-parameter settings that allow for close reconstruction of the target image. The images shown are the proxy model outputs using hyper-parameters from a single timestep.

Low-light Black-box Denoising.

One practical application of our method is for low light denoising, specifically by finding optimal hyper-parameters for the BM3D denoising algorithm. In the first step, we train a differentiable proxy to approximate the BM3D algorithm. Then in the second step we set the target images to be the noiseless ground truth images and we use the proxy to find a set of hyper-parameters for BM3D that perform well for low light denoising. We trained and tested our method using the Smartphone Image Denoising Dataset. As seen in the above figure, not only does our proxy-optimized BM3D produce images that are visually cleaner than the default BM3D, but we also manage to outperform current state-of-the-art denoisers as well.

Unmodified natural Target

ARM MALI-C71 Picture with natural Target

Edge-enhanced, high-contrast unnatural Target

ARM MALI-C71 Picture with unnatural Target

Optimize for desired qualities by varying target image.

The set of hyper-parameters found through our optimization scheme depends on the target image. Although it is often desirable to have the target image be a natural image, we are not constrained by this. This can be seen in the above figure. On the left column, we find a set of hyper-parameters using the unmodified natural target image, and as a result, the picture taken by the ARM MALI-C71 ISP mimics a natural scene. On the right column, we use a target image that is modified to be edge-enhanced and high-contrast. This results in a set of optimized hyper-parameters that when applied to the same ISP allows us to take a picture of the same scene that has high contrast and edges enhanced, even though there is no hyper-parameter in the ISP that explicitly tunes contrast and edge sharpness.

Input Image

Target Tonemapping Output

Tonemapping Output with proxy-optimized parameters

Local Image Tonemapping.

Another application of optimizing towards unnatural images is in local image tonemapping. Here, we see that our method accurately finds hyper-parameters for a tonemapping operator that reproduces the unnatural target tonemapping output.

Detection with optimized parameters

Detection with expert-tuned parameters

Optimization for Object Detection.

We are not constrained to optimizing hyper-parameters towards the specific task of producing desired target images. Because our proxy is fully differentiable, the hyper-parameters can be optimized with respect to any function or task given that it too is differentiable. We show this by optimizing a set of parameters with respect to the task of object detection by concatenating a trained FRCNN architecture directly to our proxy architecture. The signal from the detection loss can then be directly used to optimize the hyperparameters. The top row shows the full system with optimized parameters (left) outperforming the system with expert-tuned parameters (right) at car and pedestrian detection, even though the images themselves do not look appealing to a human viewer. The video below shows a time lapse of the optimization process, where we can see object detection improve as time continues.