GenSDF: TwoStage Learning of Generalizable
Signed Distance Functions
NeurIPS 2022 (Featured)
Neural signed distance functions (SDFs) implicitly model the distance from a queried location to the nearest point on a shape’s surface — negative inside the shape, positive outside, zero at the surface. Existing methods are capable of approximating diverse synthetic geometry and varying levels of detail. However, most successful methods are fully supervised–they require access to ground truth signed distance values to fit an object. As raw point clouds lack these values, a fullysupervised approach cannot learn from inthewild data.
In this work, we introduce a twostage semisupervised metalearning approach that transfers shape priors from labeled to unlabeled data to reconstruct unseen object categories. The first stage uses an episodic training scheme to simulate training on unlabeled data and metalearns initial shape priors. The second stage then introduces unlabeled data with disjoint classes in a semisupervised scheme to diversify these priors and achieve generalization. We assess our method on both synthetic data and real collected point clouds. Experimental results and analysis validate that our approach outperforms existing neural SDF methods and is capable of robust zeroshot inference on 100+ unseen classes.
Convolutional OccNets
GenSDF (ours)
(In Browser: Click and Drag to Rotate, Scroll to Zoom)
GenSDF: TwoStage Learning of Generalizable
Signed Distance Functions
Gene Chou, Ilya Chugunov, Felix Heide
NeurIPS 2022 (Featured)
Two stage metalearning semisupervised approach
We want to train on labeled and unlabeled data simultaneously, so we mimic this goal in a metalearning approach as we learn shape priors from $X$. Every few epochs, we split $X$ into two subsets $X_L$ and $X_U$ with disjoint categories. We train $X_L$ with a supervised loss, similar to typical supervised approaches on labeled data. We take away groundtruth signed distance values from $X_U$ making it “pseudounlabeled”, and train it with our selfsupervised loss. Compared to directly training on labeled and unlabeled data, this metalearning process provides flexibility as we can tune the split frequency and ratio of labeled and “pseudounlabeled” data. The model repeatedly sees the same point clouds with and without labels, so shape features generated during labeled splits are shared with “pseudounlabeled” splits.
In our second stage, we train our conditional SDF from the metalearning stage on both labeled and unlabeled data in a semisupervised fashion. As a result of the learned priors, training remains robust even with large amounts of unlabeled data.
Selfsupervised loss formulation
Crucial to the success of our method is our selfsupervised loss for training unlabeled data. For each query point $x \in \mathbb{R}^{3}$, we use its closest point $t \in P$ (where $P$ is the full point cloud) to approximate its projection to the surface.
Our insight is that we want to predict $\hat{t}$ to approximate $t$ on the point cloud by using the predicted signed distance value and our training objective is to minimize the distance between the two. Different from existing work, our formulation has an explicit penalty for incorrect sign predictions. We denote our approach selfsupervised because the predictions of the signs are used as labels.
We have
\begin{equation}
\mathcal{L}_{\mathrm{self}} = \frac{1}{K} \sum_{k \in K} \ \hat{t}_k – t_k \_2^2,
\label{selfeq}
\end{equation}
\begin{equation}
\hat{t} =\begin{cases}
x – \frac{xt}{\xt\} \ \Phi(x) & \text { if } \Phi(x) \ge 0, \\[2em]
x + \frac{xt}{\xt\} \ \Phi(x) & \text { if } \Phi(x) < 0 .
\end{cases}
\label{t_eq}
\end{equation}
$\mathcal{L}_{\mathrm{self}}$ is our selfsupervised loss and $\Phi(\cdot) $ is our trained neural network.
Evaluation of real point cloudsWe test our model on the YCB dataset, which is a realworld point cloud dataset acquired from multiview RGBD captures. The fused multiview point clouds in this dataset resemble input measurements for a robotic partpicking or manipulation task. We demonstrate robust mesh reconstructions of the measured data, e.g., recovering the “handle” of a pitcher, which may serve as input to complex robotic grasping tasks. YCB is a different domain from the synthetic training dataset, but our proposed method is able to approximate the 3D shape with detail, illustrating its ability of handling inthewild and outofdistribution data. (In Browser: Click and Drag to Rotate, Scroll to Zoom) 
Convolutional OccNets

GenSDF (ours)

Ground truth

Reconstructing the Acronym dataset
(Check out our main paper and supplement for more visualizations)
Reconstructing the YCB dataset