S3Gaussian: Self-Supervised Street Gaussians for Autonomous Driving

UC Berkeley, Peking University, Tsinghua University
* Work done during an internship at UC Berkeley. † Project leader. ‡ Corresponding author.

S3Gaussian employs 3D Gaussians to model dynamic scenes for autonomous driving without other supervisions (e.g., 3D bounding boxes).



For self-supervised street scene decomposition, we propose a multi-resolution hexplane-based encoder to encode 4D grid into feature planes and a multi-head Gaussian decoder to decode them into deformed 4D Gaussians. We optimize the overall model without extra annotations in a self-supervised manner and achieve superior scene decomposition ability and rendering quality.



S3Gaussian excels at modeling dynamic scenes, which are more common in real-world autonomous driving scenarios.


S3Gaussian achieves similar performance with StreetGaussians without the use of additional bounding boxes.


We show results from novel view synthesis on the left and dynamic scene reconstruction on the right. With the proposed spatial-temporal network for the self-supervised scene decomposition, our method S 3 Gaussian produces the best rendering quality with high fidelity and sharp details.

Compared to StreetGaussians, our method demonstrates a stronger ability to self-supervisedly reconstruct distant dynamic objects and is more sensitive to changes in scene details.

Static and Dynamic Object Decomposition

Visualization of HexPlane voxel grids demonstrates its capability to decompose static and dynamic elements. Spatial-only grid refers to the spatial voxel parameters, while the temporal grid refers to its time features.



                title={S3Gaussian: Self-Supervised Street Gaussians for Autonomous Driving},
                author={Huang, Nan and Wei, Xiaobao and Zheng, Wenzhao and An, Pengju and Lu, Ming and Zhan, Wei and Tomizuka, Masayoshi and Keutzer, Kurt and Zhang, Shanghang},
                journal={arXiv preprint arXiv:2405.20323},