Differentiable Point-Based Radiance Fields for Efficient View Synthesis
We propose a differentiable rendering algorithm for efficient novel view synthesis. By departing from volume-based representations in favor of a learned point representation, we improve on existing methods more than an order of magnitude in memory and runtime, both in training and inference. The method begins with a uniformly-sampled random point cloud and learns per-point position and view-dependent appearance, using a differentiable splat-based renderer to train the model to reproduce a set of input training images with the given pose. Our method is up to 300 × faster than NeRF in both training and inference, with only a marginal sacrifice in quality, while using less than 10 MB of memory for a static scene. For dynamic scenes, our method trains two orders of magnitude faster than STNeRF and renders at a near interactive rate, while maintaining high image quality and temporal coherence even without imposing any temporal-coherency regularizers.
Differentiable Point-Based Radiance Fields
Learning Point-Based Radiance Fields. The proposed method learns a point cloud augmented with RGB spherical harmonic coefficients per point. Image synthesis is accomplished via a differentiable splat-based rasterizer that operates as a single forward pass, without the need for ray-marching and hundreds of evaluations of coordinate-based networks as implicit neural scene representation methods. Specifically, the point rasterizer computes an alpha value from the ray-point distance with a Gaussian radial basis function weight. The learnable radiance and the alpha values are rasterized via alpha blending to render the image. The entire rendering pipeline is differentiable and we optimize for the point position and spherical harmonic coefficients end-to-end. We use a hybrid coarse-to-fine strategy (bottom) to train the entire model and successively refine the geometry represented by the learned point cloud.
Novel View Synthesis
We represent two synthetic scenes with the proposed point-based radiance model. Trained only in a few minutes on multi-view images, the proposed method achieves rendering quality comparable to state-of-the-art volumetric rendering approaches such as Plenoxels, while being more than 3 × faster in training, 2 × faster in inference and 100 × less memory-hungry.
Novel Video View Synthesis
We synthesize novel views for video timestamps of the STNeRF dataset (Frame 58 and Frame70). Compared to STNeRF and the NeRF-t model proposed in the same work, the differential point cloud radiance field approach produces substantially cleaner object boundaries, finer object detail and fewer reconstruction artifacts, especially in the presence of significant object motios.
Another example on the same novel video view generation experiment. The differential point cloud radiance field approach produces again substantially cleaner views in the presence of significant object motion.
Quantitative Evaluation on the STNeRF Dataset. Compared to STNeRF and the NeRF variants suggested in, the training speed and inference speed of the proposed method is two orders of magnitude higher.
Model Ablation Experiments. We evaluate the rendering quality of our method on the Blender dataset when gradually removing components from the rendering pipeline. Specifically, we ablate the spherical harmonics model per point, the coarse-to-fine strategy, and the filtering function for the training. The experimental results validate that all components contribute to the rendering quality.