We reformulate dynamic Gaussian deformation as Hamiltonian dynamics, enabling neural networks to learn physical conservation laws directly from data.
In a word, NeHaD achieves physically plausible motion with state-of-the-art rendering quality.
Representing dynamic scenes with realistic motion remains challenging as existing methods often produce physically implausible deformations. We introduce NeHaD, a neural deformation field for dynamic Gaussian splatting governed by Hamiltonian mechanics. Our key innovation replaces MLP-based deformation with Hamiltonian neural networks that model Gaussians evolving along energy-conserving trajectories in phase space, ensuring natural dynamics. We introduce Boltzmann equilibrium decomposition for energy-aware static/dynamic separation, and employ symplectic integration with rigidity constraints to handle real-world dissipation. Additionally, we extend NeHaD to adaptive streaming through scale-aware mipmapping. Extensive experiments demonstrate that NeHaD achieves physically plausible dynamic rendering with good quality-efficiency trade-offs, representing the first application of Hamiltonian mechanics to Gaussian deformation.
To address the lack of physics-informed inductive biases in standard MLP decoders, we propose a Hamiltonian Neural Network (HNN) decoder. Explicitly defining position-momentum coordinates $(\boldsymbol{q}, \boldsymbol{p})$ for high-dimensional Gaussians is intractable; consequently, we substitute them with implicit latent representations $\boldsymbol{h}$ extracted from hex-planes as the input to the HNN. Meanwhile, the core of simulating Hamiltonian dynamics lies in learning the vector field $\boldsymbol{v}$ from these implicit features. However, directly learning this high-dimensional field is challenging due to potential mode collapse and the difficulty of enforcing conservation constraints. To address this, we propose decomposing the vector field and instead learning two scalar potentials, $F_1$ and $F_2$, to effectively generate the dynamics via automatic differentiation. Specifically, the conservative field $\boldsymbol{v}_c$ (preserving energy) and solenoidal field $\boldsymbol{v}_s$ (preserving volume) are formulated as:
Here, $\boldsymbol{M}$ is the symplectic permutation matrix. The total vector field $\boldsymbol{v} = \boldsymbol{v}_c + \boldsymbol{v}_s$ is finally projected by lightweight adapters $\mathcal{A}$ to obtain deformations for Gaussian attributes ($\Delta \boldsymbol{\mu}, \Delta \boldsymbol{s}, \Delta \boldsymbol{r}$), e.g., $\Delta \boldsymbol{\mu} = \mathcal{A}_{\boldsymbol{\mu}}(\boldsymbol{v})$.
Indiscriminately deforming all primitives leads to computational inefficiency and potential artifacts. We introduce a statistical mechanics approach to adaptively mask Gaussians based on their deviation from learned equilibrium states. We employ distinct decomposition strategies tailored to the visual characteristics of specific attributes:
The soft activation masks are formulated using the Boltzmann distribution. Take position dynamics as an example:
Here, $\beta$ represents the inverse temperature controlling the sharpness of the distribution, and $\gamma$ ensures a minimum responsiveness floor. The symbol $\odot$ denotes the Hadamard (element-wise) product. Consequently, primitives far from equilibrium (high energy) undergo significant deformation, while stable ones remain static.
Real-world dynamics are inherently dissipative due to friction and other non-conservative forces. This deviation from strict energy conservation can mislead the HNN into generating erroneous deformation fields. To mitigate this and strictly enforce physical plausibility, we impose explicit constraints tailored to the attribute dynamics:
Here, $\phi_{max}$ defines the maximum allowable rotation per timestamp. The symbol $\otimes$ represents the standard quaternion multiplication, and $\mathcal{N}(\cdot)$ denotes the normalization operation to ensure valid unit quaternions.
For efficient transmission and rendering across varying network bandwidths and device capabilities, we extend NeHaD with two strategic enhancements focusing on anti-aliasing and Level-of-Detail (LOD) management:
Here, $\rho$ represents the anisotropy ratio derived from scale extrema, $\boldsymbol{1}$ denotes the all-ones vector used to broadcast the scalar mean $\bar{L}$, and $N$ indicates the total number of LOD levels. This layered architecture allows the system to fine-tune transmission rates dynamically while preserving high-fidelity rendering details.
We compare NeHaD with state-of-the-art models on three datasets: the synthetic D-NeRF, the monocular real-world HyperNeRF, and the multi-view real-world DyNeRF. Our method consistently outperforms baseline models in visual perception, delivering realistic renderings with coherent dynamic motion. You can download the full-resolution demo video via this Google Drive link.
Below we visualize the qualitative comparisons. Hover over the images to inspect fine-grained details.
@inproceedings{qin-nehad,
title = {Neural Hamiltonian Deformation Fields for Dynamic Scene Rendering},
author = {Qin, Hai-Long and Wang, Sixian and Lu, Guo and Dai, Jincheng},
booktitle = {SIGGRAPH Asia 2025 Conference Papers},
pages = {1--11},
year = {2025}
}