Sparse identification of non-linear dynamics

Sparse identification of nonlinear dynamics (SINDy) is a data-driven algorithm for obtaining dynamical systems from data.^[1] Given a series of snapshots of a dynamical system and its corresponding time derivatives, SINDy performs a sparsity-promoting regression (such as LASSO) on a library of nonlinear candidate functions of the snapshots against the derivatives to find the governing equations. This procedure relies on the assumption that most physical systems only have a few dominant terms which dictate the dynamics, given an appropriately selected coordinate system and quality training data.^[2]^[3] It has been applied to identify the dynamics of fluids, based on proper orthogonal decomposition, as well as other complex dynamical systems, such as biological networks.^[4]

Mathematical Overview

First, consider a dynamical system of the form

${\dot {\textbf {x}}}={\frac {d}{dt}}{\textbf {x}}(t)={\textbf {f}}({\textbf {x}}(t)),$

where ${\textbf {x}}(t)\in \mathbb {R} ^{n}$ is a state vector (snapshot) of the system at time $t$ and the function ${\textbf {f}}({\textbf {x}}(t))$ defines the equations of motion and constraints of the system. The time derivative may be either prescribed or numerically approximated from the snapshots.

With ${\textbf {x}}$ and ${\dot {\textbf {x}}}$ sampled at $m$ equidistant points in time ( $t_{1},t_{2},\cdots ,t_{m}$ ), these can be arranged into matrices of the form

${\bf {{X}={\begin{bmatrix}{\bf {{x}^{T}(t_{1})}}\\{\bf {{x}^{T}(t_{2})}}\\\vdots \\{\bf {{x}^{T}(t_{m})}}\end{bmatrix}}={\begin{bmatrix}x_{1}(t_{1})&x_{2}(t_{1})&\cdots &x_{n}(t_{1})\\x_{1}(t_{2})&x_{2}(t_{2})&\cdots &x_{n}(t_{2})\\\vdots &\vdots &\ddots &\vdots \\x_{1}(t_{m})&x_{2}(t_{m})&\cdots &x_{n}(t_{m})\end{bmatrix}},}}$

and similarly for ${\dot {\textbf {X}}}$ .

Next, a library ${\bf {{\Theta }({\textbf {X}})}}$ of nonlinear candidate functions of the columns of ${\textbf {X}}$ is constructed, which may be constant, polynomial, or more exotic functions (like trigonometric and rational terms, and so on):

$\ \ \ {\bf {{\Theta }({\bf {{X})={\begin{bmatrix}\vline &\vline &\vline &\vline &&\vline &\vline &\\1&{\bf {X}}&{\bf {{X}^{2}}}&{\bf {{X}^{3}}}&\cdots &\sin({\bf {{X})}}&\cos({\bf {{X})}}&\cdots \\\vline &\vline &\vline &\vline &&\vline &\vline &\end{bmatrix}}}}}}$

The number of possible model structures from this library is combinatorically high. ${\textbf {f}}({\textbf {x}}(t))$ is then substituted by ${\bf {{\Theta }({\textbf {X}})}}$ and a vector of coefficients ${\bf {{\Xi }=\left[{\bf {{\xi }_{1}{\bf {{\xi }_{2}\cdots {\bf {{\xi }_{n}}}}}}}\right]}}$ determining the active terms in ${\textbf {f}}({\textbf {x}}(t))$ :

${\dot {\bf {X}}}={\bf {{\Theta }({\bf {{X}){\bf {\Xi }}}}}}$

Because only a few terms are expected to be active at each point in time, an assumption is made that ${\textbf {f}}({\textbf {x}}(t))$ admits a sparse representation in ${\bf {{\Theta }({\textbf {X}})}}$ . This then becomes an optimization problem in finding a sparse ${\bf {\Xi }}$ which optimally embeds ${\dot {\textbf {X}}}$ . In other words, a parsimonious model is obtained by performing least squares regression on the system (4) with sparsity-promoting ( $L_{1}$ ) regularization

${\bf {{\xi }_{k}={\underset {\bf {{\xi }'_{k}}}{\arg \min }}\left|\left|{\dot {\bf {X}}}_{k}-{\bf {{\Theta }({\bf {{X}){\bf {{\xi }'_{k}}}}}}}\right|\right|_{2}+\lambda \left|\left|{\bf {{\xi }'_{k}}}\right|\right|_{1},}}$

where $\lambda$ is a regularization parameter. Finally, the sparse set of ${\bf {{\xi }_{k}}}$ can be used to reconstruct the dynamical system:

${\dot {x}}_{k}={\bf {{\Theta }({\bf {{x}){\bf {{\xi }_{k}}}}}}}$

References

^ Brunton, Steven L.; Kutz, J. Nathan (2022-05-05). Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Higher Education from Cambridge University Press. doi:10.1017/9781009089517. ISBN 9781009089517. Retrieved 2022-10-25.
^ Brunton, Steven L.; Proctor, Joshua L.; Kutz, J. Nathan (2016-04-12). "Discovering governing equations from data by sparse identification of nonlinear dynamical systems". Proceedings of the National Academy of Sciences. 113 (15): 3932–3937. arXiv:1509.03580. Bibcode:2016PNAS..113.3932B. doi:10.1073/pnas.1517384113. ISSN 0027-8424. PMC 4839439. PMID 27035946.
^ Huang, Yunfei.; et al. (2022). "Sparse inference and active learning of stochastic differential equations from data". Scientific Reports. 12 (1): 21691. doi:10.1038/s41598-022-25638-9. PMC 9755218. PMID 36522347.
^ Mangan, Niall M.; Brunton, Steven L.; Proctor, Joshua L.; Kutz, J. Nathan (2016-05-26). "Inferring biological networks by sparse identification of nonlinear dynamics". arXiv:1605.08368 [math.DS].

[1] Brunton, Steven L.; Kutz, J. Nathan (2022-05-05). Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Higher Education from Cambridge University Press. doi:10.1017/9781009089517. ISBN 9781009089517. Retrieved 2022-10-25.

[2] Brunton, Steven L.; Proctor, Joshua L.; Kutz, J. Nathan (2016-04-12). "Discovering governing equations from data by sparse identification of nonlinear dynamical systems". Proceedings of the National Academy of Sciences. 113 (15): 3932–3937. arXiv:1509.03580. Bibcode:2016PNAS..113.3932B. doi:10.1073/pnas.1517384113. ISSN 0027-8424. PMC 4839439. PMID 27035946.

[3] Huang, Yunfei.; et al. (2022). "Sparse inference and active learning of stochastic differential equations from data". Scientific Reports. 12 (1): 21691. doi:10.1038/s41598-022-25638-9. PMC 9755218. PMID 36522347.

[4] Mangan, Niall M.; Brunton, Steven L.; Proctor, Joshua L.; Kutz, J. Nathan (2016-05-26). "Inferring biological networks by sparse identification of nonlinear dynamics". arXiv:1605.08368 [math.DS].

[1]

[2]

[3]

[4]