Thallo – Scheduling for High-Performance Large-scale Non-linear Least-Squares Solvers

Thallo is a domain-specific language (DSL) that generates tailored high-performance GPU solvers and schedules from a concise, high-level energy description without the hassle of manually constructing and maintaining tedious and error-prone solvers. Thallo code is compact, and exceeds state-of-the-art performance across diverse applications in graphics and vision.

Large-scale optimization problems at the core of many graphics, vision, and imaging applications are often implemented by hand in tedious and error-prone processes in order to achieve high performance (in particular on GPUs), despite recent developments in libraries and DSLs. At the same time, these hand-crafted solver implementations reveal that the key for high performance is a problem-specific schedule that enables efficient usage of the underlying hardware. We propose Thallo, a domain-specific language for large-scale non-linear least squares optimization problems. Thallo takes as input a compact, shader-like representation of an energy function and a (potentially auto-generated) schedule, translating the combination into high performance
GPU solvers. Thallo can generate solvers from a large scheduling space, and thus able to handle a large set of large-scale non-linear and non-smooth problems with various degrees of non-locality and compute-tomemory ratios, including diverse applications such as bundle adjustment, face blendshape fitting, and spatially-varying Poisson deconvolution. Abstracting schedules from the optimization, we outperform state-of-the-art GPU-based optimization DSLs by an average of 16× across all applications introduced in this work, and even some published hand-written GPU solvers by 30%+.

Paper

Thallo – Scheduling for High-Performance Large-scale Non-linear Least-Squares Solvers

Michael Mara, Felix Heide, Michael Zollhöfer, Matthias Nießner, Pat Hanrahan

SIGGRAPH 2021

Video Summary

Thallo: a DSL for large-scale NLLS optimization

Thallo is a scheduling language for a nonlinear least square (NLLS) Domain-specific language (DSL) that generates GPU-based solvers. We introduced a scheduling space that encompasses code reorganization choices that span the literature for fast GPU nonlinear least squares solvers in graphics and vision. And Thallo enables rapid exploration of this scheduling space using an energy specification and scheduling specification language design. We also introduce a compiler that takes a user’s problem description and automatically generates a highly efficient solver by exploiting problem structure. Our heuristic autoscheduler can automatically generate a good schedule from the energy specification, so novice to intermediate users never have to write their own schedule.

Illustrative Example: 3D Blendshape Fitting

Thallo avoids the hassle of manually constructing and maintaining tedious and error-prone solvers. Our heuristic rule-based autoscheduler also allows non-domain experts to achieve high performance without reasoning about the scheduling transforms.
Click here to see a demo on how to use Thallo for 3D Blendshape Fitting.

local N,M = Dims("N", "M")
Inputs {
	BlendshapeWeights = Unknown(opt_float, {M}, 0),
	AverageMesh       = Array(opt_float3, {N}, 1),
	BlendshapeBasis   = Array(opt_float3, {N,M}, 2),
	Target            = Array(opt_float3, {N}, 4),
	w_regSqrt         = Param(float, 5),
	cameraParams      = Param(float9, 6),
	cameraXform       = Param(float6, 7)
}

local m,n = M(),N()
local Mesh = AverageMesh(n) + Sum({m}, BlendshapeBasis(n,m)*BlendshapeWeights(m))
local Pos2D = project_point(rigid_trans(PoseToMatrix(cameraXForm), Mesh), cameraParams)
local e_fit = Target(n) - Pos2D
local valid = greatereq(Target(n,0), -999999.9)
r = Residuals {
	reg = w_regSqrt*BlendshapeWeights(m),
	fit = Select(valid,e_fit,0)
}
r.fit.Jp:set_materialize(true)

The code needed to perform 3D blendshape fitting is expressed per-element, and only requires a straightforward translation from the higher-level linear algebra formulation. Click here for more details.

Related Publications

[1] Zachary DeVito, Michael Mara, Michael Zollhöfer, Gilbert Bernstein, Jonathan Ragan-Kelley, Christian Theobalt, Pat Hanrahan, Matthew Fisher and Matthias Nießner. Opt: A Domain Specific Language for Non-linear Least Squares Optimization in Graphics and Imaging. ACM Transactions on Graphics (SIGGRAPH), 2018

[2] Felix Heide, Steven Diamond, Matthias Nießner, Jonathan Ragan-Kelley, Wolfgang Heidrich, and Gordon Wetzstein. ProxImaL: Efficient Image Optimization using Proximal Algorithms. ACM Transactions on Graphics (SIGGRAPH), 2016