Neural Hamiltonian Deformation Fields for Dynamic Scene Rendering

SIGGRAPH Asia 2025

Hai-Long Qin¹ Sixian Wang² Guo Lu² Jincheng Dai¹

¹Beijing University of Posts and Telecommunications (BUPT)
²Shanghai Jiao Tong University (SJTU)

Paper Poster Slide Video

TL;DR

We reformulate dynamic Gaussian deformation as Hamiltonian dynamics, enabling neural networks to learn physical conservation laws directly from data.

Hamiltonian Neural Network Decoder Learn energy-conserving deformation fields through symplectic gradients, replacing purely data-driven MLPs with physics-informed inductive biases.
Boltzmann Equilibrium Decomposition Adaptively separate static and dynamic Gaussians via soft masks based on their deviation from spatial-temporal equilibrium states.
Physics-Informed Constraints Ensure long-term stability through symplectic integration for position dynamics and local rigidity regularization for rotation dynamics.

In a word, NeHaD achieves physically plausible motion with state-of-the-art rendering quality.

Abstract

Representing dynamic scenes with realistic motion remains challenging as existing methods often produce physically implausible deformations. We introduce NeHaD, a neural deformation field for dynamic Gaussian splatting governed by Hamiltonian mechanics. Our key innovation replaces MLP-based deformation with Hamiltonian neural networks that model Gaussians evolving along energy-conserving trajectories in phase space, ensuring natural dynamics. We introduce Boltzmann equilibrium decomposition for energy-aware static/dynamic separation, and employ symplectic integration with rigidity constraints to handle real-world dissipation. Additionally, we extend NeHaD to adaptive streaming through scale-aware mipmapping. Extensive experiments demonstrate that NeHaD achieves physically plausible dynamic rendering with good quality-efficiency trade-offs, representing the first application of Hamiltonian mechanics to Gaussian deformation.

Figure 1. Our method uses Hamiltonian mechanics to enhance dynamic Gaussian splatting for improved rendering quality and motion coherence. (a) Human cognition process. (b) Scene rendering process. (c) Both processes follow physical laws, with Hamiltonian mechanics offering mathematical frameworks aligned with physical intuition. (d) Our method incorporates Hamiltonian mechanics as a physical prior in dynamic Gaussian splatting, improving rendering reality (4. different colors represent different deformations, and our method reduces overlap while keeping structural clusters).

Method

Hamiltonian Neural Network Decoder

To address the lack of physics-informed inductive biases in standard MLP decoders, we propose a Hamiltonian Neural Network (HNN) decoder. Explicitly defining position-momentum coordinates $(\boldsymbol{q}, \boldsymbol{p})$ for high-dimensional Gaussians is intractable; consequently, we substitute them with implicit latent representations $\boldsymbol{h}$ extracted from hex-planes as the input to the HNN. Meanwhile, the core of simulating Hamiltonian dynamics lies in learning the vector field $\boldsymbol{v}$ from these implicit features. However, directly learning this high-dimensional field is challenging due to potential mode collapse and the difficulty of enforcing conservation constraints. To address this, we propose decomposing the vector field and instead learning two scalar potentials, $F_1$ and $F_2$, to effectively generate the dynamics via automatic differentiation. Specifically, the conservative field $\boldsymbol{v}_c$ (preserving energy) and solenoidal field $\boldsymbol{v}_s$ (preserving volume) are formulated as:

\boldsymbol{v}_{c} = \nabla_{\boldsymbol{h}} F_1(\boldsymbol{h}), \quad \boldsymbol{v}_{s} = \nabla_{\boldsymbol{h}} F_2(\boldsymbol{h}) \boldsymbol{M}^{\top}, \quad \boldsymbol{M} = \begin{bmatrix} \boldsymbol{0} & \mathbf{I} \\ -\mathbf{I} & \boldsymbol{0} \end{bmatrix}

Here, $\boldsymbol{M}$ is the symplectic permutation matrix. The total vector field $\boldsymbol{v} = \boldsymbol{v}_c + \boldsymbol{v}_s$ is finally projected by lightweight adapters $\mathcal{A}$ to obtain deformations for Gaussian attributes ($\Delta \boldsymbol{\mu}, \Delta \boldsymbol{s}, \Delta \boldsymbol{r}$), e.g., $\Delta \boldsymbol{\mu} = \mathcal{A}_{\boldsymbol{\mu}}(\boldsymbol{v})$.

Boltzmann Equilibrium Decomposition

Indiscriminately deforming all primitives leads to computational inefficiency and potential artifacts. We introduce a statistical mechanics approach to adaptively mask Gaussians based on their deviation from learned equilibrium states. We employ distinct decomposition strategies tailored to the visual characteristics of specific attributes:

Position $\boldsymbol{\mu}$ (Spatial-Temporal): Position dynamics require dual selectivity. We model the energy deviation $E_{st}$ based on both spatial distance and temporal context to filter out static regions.
Scaling $\boldsymbol{s}$ (Temporal-Only): Scaling contributes to fine-grained surface details and requires spatial universality. We model the energy $E_{t}$ based solely on temporal deviation to preserve texture authenticity.

The soft activation masks are formulated using the Boltzmann distribution. Take position dynamics as an example:

M = (1 - \gamma) \cdot \exp(-\beta E) + \gamma, \quad \boldsymbol{\mu}' = \boldsymbol{\mu} + \Delta\boldsymbol{\mu} \odot (1 - M_{pos})

Here, $\beta$ represents the inverse temperature controlling the sharpness of the distribution, and $\gamma$ ensures a minimum responsiveness floor. The symbol $\odot$ denotes the Hadamard (element-wise) product. Consequently, primitives far from equilibrium (high energy) undergo significant deformation, while stable ones remain static.

Physics-Informed Constraints

Real-world dynamics are inherently dissipative due to friction and other non-conservative forces. This deviation from strict energy conservation can mislead the HNN into generating erroneous deformation fields. To mitigate this and strictly enforce physical plausibility, we impose explicit constraints tailored to the attribute dynamics:

Position $\boldsymbol{\mu}$ (Symplectic Integration): Standard integrators often suffer from numerical energy drift. We adopt the Position Verlet scheme to govern translational evolution. We interpret the predicted deformation $\Delta \boldsymbol{\mu}$ as instantaneous velocity and the negative potential gradient $-\nabla \mathcal{U}$ as the physical force (acceleration) $\boldsymbol{F}$ driving the primitive: $\tilde{\boldsymbol{\mu}} = \boldsymbol{\mu} + \Delta t \cdot \Delta \boldsymbol{\mu} + \frac{(\Delta t)^2}{2} \boldsymbol{F}$
Rotation $\boldsymbol{r}$ (Rigidity Regularization): Unconstrained rotation predictions often lead to unrealistic twisting artifacts. Inspired by As-Rigid-As-Possible (ARAP) principles, we enforce local rigidity by clamping the rotation magnitude. The rotation angle $\phi$ is constrained via a smooth saturation function: $\phi^{\prime} = \phi_{max} \cdot \tanh\left(\frac{\phi}{\phi_{max}}\right), \quad \boldsymbol{r}^{\prime} = \mathcal{N}(\boldsymbol{r} \otimes \Delta \boldsymbol{r}^{\prime})$

Here, $\phi_{max}$ defines the maximum allowable rotation per timestamp. The symbol $\otimes$ represents the standard quaternion multiplication, and $\mathcal{N}(\cdot)$ denotes the normalization operation to ensure valid unit quaternions.

Figure 2. Overall pipeline of NeHaD. (from left to right) An HNN with MLP baseline learns conservation laws from data. Through backpropagation of Hamiltonian gradients, the HNN optimizes vector fields and predicts Gaussian deformations (position, scaling, rotation) via adapters. The Boltzmann equilibrium decomposition decides which primitives should not be deformed with soft masks, i.e., smaller deviations from equilibrium maintain static during deformation. Physics-informed constraints including symplectic integration and rigidity regularization are used to preserve system properties.

Adapting NeHaD to Adaptive Streaming

For efficient transmission and rendering across varying network bandwidths and device capabilities, we extend NeHaD with two strategic enhancements focusing on anti-aliasing and Level-of-Detail (LOD) management:

Scale-aware Anisotropic MipMapping: To address aliasing artifacts caused by varying viewing distances, we implement a multi-level texture sampling strategy. We compute an adaptive mipmap level $\hat{\boldsymbol{l}}$ that smoothly interpolates between isotropic and anisotropic regimes. The final level is determined by the principal level $\boldsymbol{L}$ and an anisotropy regularization factor $\beta(\rho)$: $\hat{\boldsymbol{l}} = \boldsymbol{L} - \beta(\rho) \cdot (\boldsymbol{L} - \bar{L}\boldsymbol{1}), \quad \beta(\rho) = \frac{\tanh(\rho/3-1)}{1+\tanh(\rho/3-1)}$
Layered Progressive Optimization: For adaptive bit-rate streaming, we adopt a coarse-to-fine optimization strategy. We begin with a base set of ground Gaussians $\mathcal{G}_0$ for the lowest quality level. As training progresses to higher resolutions, we adaptively accumulate residual enhancement layers $\Delta \mathcal{G}_j$ based on opacity thresholds: $\mathcal{G}_i = \mathcal{G}_0 + \sum_{j=1}^{i} \Delta \mathcal{G}_j, \quad j \in \{1, 2, \dots, N\}$

Here, $\rho$ represents the anisotropy ratio derived from scale extrema, $\boldsymbol{1}$ denotes the all-ones vector used to broadcast the scalar mean $\bar{L}$, and $N$ indicates the total number of LOD levels. This layered architecture allows the system to fine-tune transmission rates dynamically while preserving high-fidelity rendering details.

Results

We compare NeHaD with state-of-the-art models on three datasets: the synthetic D-NeRF, the monocular real-world HyperNeRF, and the multi-view real-world DyNeRF. Our method consistently outperforms baseline models in visual perception, delivering realistic renderings with coherent dynamic motion. You can download the full-resolution demo video via this Google Drive link.

Comprehensive Qualitative Rendering Results

Below we visualize the qualitative comparisons. Hover over the images to inspect fine-grained details.

Hover to Zoom

Comparison on D-NeRF Dataset

Hover to Zoom

Comparison on HyperNeRF Dataset

Hover to Zoom

Comparison on DyNeRF Dataset

Citation

@inproceedings{qin-nehad,
    title = {Neural Hamiltonian Deformation Fields for Dynamic Scene Rendering},
    author = {Qin, Hai-Long and Wang, Sixian and Lu, Guo and Dai, Jincheng},
    booktitle = {SIGGRAPH Asia 2025 Conference Papers},
    pages = {1--11},
    year = {2025}
}