arXiv Preprint

DSD-GS: Dynamic-Static Decomposition of Gaussian Splatting for Efficient and High-Fidelity Dynamic Scene Reconstruction

Youngtae Han, Sung-hwan Han, Youngmin Yi
Department of Artificial Intelligence Engineering, Sogang University
TL;DR We propose DSD-GS, a COLMAP-free dynamic reconstruction framework that achieves SOTA image quality with 10-minute training time and 700+ FPS rendering via a dynamic-static decomposition strategy.
Results Left
Results Right

Abstract

Dynamic scene reconstruction and novel view synthesis are fundamental to next-generation visual intelligence applications such as virtual reality, robotics, and digital twins. However, high-fidelity reconstruction of complex, time-varying scenes from arbitrary viewpoints remains a significant challenge. Existing dynamic 3DGS methods suffer from computational inefficiency, since they model all Gaussians as dynamic components. While recent decomposition-based approaches address this issue, they still struggle with degraded reconstruction quality and prolonged training times. To mitigate these limitations, we propose a novel dynamic reconstruction framework built upon an efficient static-dynamic decomposition strategy using a Feed-Forward Gaussian Splatting encoder and an optical flow model. By eliminating redundant computations on static regions, our method achieves state-of-the-art performance, outperforming existing baselines across rendering quality, training and rendering speeds, and storage efficiency. Notably, on Neural 3D dataset, our framework requires only 10 minutes for training and achieves a rendering speed of over 700 FPS on a single NVIDIA RTX 5090 GPU at resolution of 1352×1014. Furthermore, our decomposition strategy eliminates the need for COLMAP preprocessing and enables deterministic initialization, thereby enhancing both efficiency and reproducibility.

Method

Brief description of the proposed method. Explain the key components and how they work together.

Method Overview

Results

Table 1. Quantitative comparison on Neural 3D dataset. In the Colmap column, SA denotes 'Sparse point cloud for All frames' and D0 denotes 'Dense point cloud for the 0th frame'. Following the original STG paper, which reports training six models for every 50 frames, we provide results for both the multi-model approach and a single-model approach trained on the full 300-frame sequence.

Method Colmap Preproc. Time ↓ PSNR ↑SSIM ↑LPIPS ↓ Train Time ↓FPS ↑Storage ↓Frames
4DGSD06 mins 28.720.93060.1528 33 mins9840.3300
STGSA25 mins 31.750.94730.1423 2h 43mins683127.550×6
STGSA25 mins 31.460.94320.1474 29 mins53254.0300
TaylorGSA25 mins 29.800.95580.1597 9 hours125205.7300
Swift4DD018 mins 29.930.93830.1370 19 mins273141.2300
DeGaussD06 mins 30.160.93570.1430 1h 27mins95117.5300
OURS-35K4 sec 32.350.94800.1295 10 mins76623.1300
OURS-45K4 sec 32.720.95020.1221 14 mins75523.7300

BibTeX

@misc{han2026dsdgsdynamicstaticdecompositiongaussian, title={DSD-GS: Dynamic-Static Decomposition of Gaussian Splatting for Efficient and High-Fidelity Dynamic Scene Reconstruction}, author={Youngtae Han and Sung-hwan Han and Youngmin Yi}, year={2026}, eprint={2605.30863}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2605.30863}, }