dynamic-UGSDF

Abstract

Dynamic scene rendering and reconstruction play a crucial role in computer vision and augmented reality. Recent methods based on 3D Gaussian Splatting (3DGS), have enabled accurate modeling of dynamic urban scenes, but for urban scenes they require both camera and LiDAR data, ground-truth 3D segmentations and motion data in the form of tracklets or pre-defined object templates such as SMPL. In this work, we explore whether a combination of 2D object agnostic priors in the form of depth and point tracking coupled with a signed distance function (SDF) representation for dynamic objects can be used to relax some of these requirements. We present a novel approach that integrates Signed Distance Functions (SDFs) with 3D Gaussian Splatting (3DGS) to create a more robust object representation by harnessing the strengths of both methods. Our unified optimization framework enhances the geometric accuracy of 3D Gaussian splatting and improves deformation modeling within the SDF, resulting in a more adaptable and precise representation. We demonstrate that our method achieves near state-of-the-art performance in rendering metrics even without LiDAR data on urban scenes. Furthermore, when incorporating LiDAR, our approach surpasses existing methods in reconstructing and generating novel views across diverse object categories, without ground-truth 3D motion annotation. Additionally, our method enables various scene editing tasks including scene decomposition, and scene composition.

Method

UGSDF integrates 3D Gaussian Splatting (3DGS) and Signed Distance Functions (SDFs) to accurately model and render dynamic urban scenes. The method takes 2D priors—depth maps from a monocular depth network and point tracks from a tracker—to derive 3D geometry and motion cues without requiring LiDAR or ground-truth 3D motion annotations. The approach builds a canonical 3D model of each dynamic object using 2D depth and tracking data, then jointly learns SDF and 3D Gaussian representations. The Gaussians provide high-fidelity rendering and guide surface sampling for the SDF, while the SDF refines Gaussian placement and enforces geometric smoothness. This bi-directional guidance yields accurate reconstructions and realistic novel-view synthesis of vehicles and pedestrians in real-world urban environments

UGSDF takes images, depth maps, and 2D tracking data as input to jointly learn 3D Gaussian and SDF representations, producing realistic renderings of dynamic urban scenes

An MLP-based SDF network predicts signed distance values for each point, deforming observations into a canonical space for consistent shape modeling

The SDF network guides where to add or remove Gaussian primitives, improving geometric accuracy and surface fidelity in dynamic regions

BibTeX

@InProceedings{Tourani_2025_ICCV, author = {Tourani, Siddharth and Reddy, Jayaram and Kumbar, Akash and Tourani, Satyajit and Goyal, Nishant and Krishna, Madhava and Reddy, N Dinesh and Khan, Muhammad Haris}, title = {Leveraging 2D Priors and SDF Guidance for Urban Scene Rendering}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {29051-29063} }

Acknowledgements

Page borrowed from Nerfies.

UGSDF:

Leveraging 2D Priors and SDF Guidance for Dynamic Urban Scene Rendering

Abstract

Method

Qualitative Results

Poster

BibTeX

Acknowledgements