datafold

Operator-theoretic models to identify dynamical systems and parametrize point cloud geometry

These details have not been verified by PyPI

Project links

Project description

What is datafold?

datafold is a MIT-licensed Python package containing operator-theoretic, data-driven models to identify dynamical systems from time series data and to infer geometrical structures in point clouds.

The package includes:

Implementations and variants of the Dynamic Mode Decomposition as data-driven methods to identify and analyze dynamical systems from time series collection data. This incldues:
- DMDFull or DMDEco as standard methods of DMD
- OnlineDMD or StreamingDMD modify the DMD to handle streaming data
- DMDControl augments the DMD to handle additional control input
- EDMD - The Extended-DMD, which allows setting up a highly flexible dictionary to decompose and embed time series data and thereby handle nonlinear dynamics within the Koopman operator framework. EDMD wraps an arbitrary DMD variation for the decomposition. The key advantage of this is, that the EDMD directly profits from the above functionalities. EDMD can be used in control or streaming settings. Furthermore, the dictionary can also be learnt from the data, corresponding to the EDMD-DL.
An efficient implementation of the DiffusionMaps model to infer geometric meaningful structures from (time series) data, such as the eigenfunctions of the Laplace-Beltrami operator. As a distinguishing factor to other implementations, the model can handle a sparse kernel matrix and allows setting an arbitrary kernel, including the standard Gaussian kernel, continuous k-nearest neighbor kernel, or dynamics-adapted cone kernel.
Cross-validation. The method EDMDCV allows model parameters to be optimized with cross-validation splittings that account for the temporal order in time series data.
Methods to perform Model Predictive Control (MPC) with Koopman operator-based methods ( mainly the EDMD).
Regression models for high-dimensional data, which are commonly used for out-of-sample extensions for the Diffusion Maps model. This includes the (auto-tuned) Laplacian Pyramids or Geometric Harmonics to interpolate general function values on a point cloud manifold.
A data structure TSCDataFrame to handle time series collection (TSC) data. It simplifies model inputs/output and make it easier to describe various forms of time series data.

See also this introduction page. For a mathematical thorough introduction, we refer to the scientific literature.

Note

The project is under active development in a research-driven environment.

Code quality varies from “experimental/early stage” to “well-tested”. Well tested code is listed in the software documentation and are directly accessible through the highest module level (e.g. from datafold import ...). Experimental code is only accessible via “deep imports” (e.g. from datafol.dynfold.outofsample import ...) and may raise a warning when using it.
The interfaces within datafold are not stable. The software is not intended for production. Nevertheless, if we break something it is intentional and we hope that such adaptations become less over time.
There is no deprecation cycle. The software uses semantic versioning policy [major].[minor].[patch], i.e.
- major - making incompatible changes in the (documented) API
- minor - adding functionality in a backwards-compatible manner
- patch - backwards-compatible bug fixes
We do not intend to indicate a feature complete milestone with version 1.0.

Cite

If you use datafold in your research, please cite this paper published in the Journal of Open Source Software (JOSS).

Lehmberg et al., (2020). datafold: data-driven models for point clouds and time series on manifolds. Journal of Open Source Software, 5(51), 2283, https://doi.org/10.21105/joss.02283

BibTeX:

@article{Lehmberg2020,
         doi       = {10.21105/joss.02283},
         url       = {https://doi.org/10.21105/joss.02283},
         year      = {2020},
         publisher = {The Open Journal},
         volume    = {5},
         number    = {51},
         pages     = {2283},
         author    = {Daniel Lehmberg and Felix Dietrich and Gerta K{\"o}ster and Hans-Joachim Bungartz},
         title     = {datafold: data-driven models for point clouds and time series on manifolds},
         journal   = {Journal of Open Source Software}}

How to get it?

Installation requires Python>=3.9 with pip and setuptools installed (both packages ship with a standard Python installation). The package dependencies install automatically. The main dependencies and their usage in datafold are listed in the section “Dependencies” below.

There are two ways to install datafold:

1. From PyPI

This is the standard way for users. The package is hosted on the official Python package index (PyPI) and installs the core package (excluding tutorials and tests). The tutorial files can be downloaded separately here.

To install the package and its dependencies with pip, run

python -m pip install datafold

Note

If you run Python in an Anaconda environment you can use pip from within conda. See also official instructions.

conda activate venv
conda install pip
pip install datafold

2. From source

This way is recommended if you want to access the latest (but potentially unstable) development state, run tests or wish to contribute (see section “Contributing” for details). Download or git-clone the source code repository.

Download the repository
1. If you wish to contribute code, it is required to have git installed. Clone the repository with
```
git clone https://gitlab.com/datafold-dev/datafold.git
```
2. If you only want access to the source code (current master branch), download one of the compressed file types (zip, tar.gz, tar.bz2, tar)
Install the package from the downloaded repository
```
python -m pip install .
```

Contributing

Any contribution (code/tutorials/documentation improvements), question or feedback is very welcome. Either use the issue tracker or Email us. Instructions to set up datafold for development can be found here.

Dependencies

The dependencies of the core package are managed in the file requirements.txt and install with datafold. The tests, tutorials, documentation and code analysis require additional dependencies which are managed in requirements-dev.txt.

datafold integrates with common packages from the Python scientific computing stack:

NumPy

NumPy is used throughout datafold and is the default package for numerical data and algorithms.
pandas

datafold uses pandas’ DataFrame as a base class for TSCDataFrame to capture various forms of time series data. The data It includes specific time series collection functionality and is mostly compatible with pandas’ rich functionality.
scikit-learn

All datafold algorithms that are part of the “machine learning pipeline” align to the scikit-learn API. This is done by deriving the models from BaseEstimator. and appropriate MixIns. datafold defines own MixIns that align with the API in a duck-typing fashion to allow identifying dynamical systems from temporal data in TSCDataFrame.
SciPy

The package is used for elementary numerical algorithms and data structures in conjunction with NumPy. This includes (sparse) linear least square regression, (sparse) eigenpairs solver and sparse matrices as optional data structure for kernel matrices.

How does it compare to other software?

Note: This list covers only Python packages.

scikit-learn

provides algorithms and models along the entire machine learning pipeline, with a strong focus on static data (i.e. without temporal context). datafold integrates into scikit-learn’ API and all data-driven models are subclasses of BaseEstimator. An important contribution of datafold is the DiffusionMaps model as popular framework for manifold learning, which is not contained in scikit-learn’s set of algorithms. Furthermore, datafold includes dynamical systems as a new model class that is operable with scikit-learn - the attributes align to supervised learning tasks. The key differences are that a model processes data of type TSCDataFrame and instead of a one-to-one relation in the model’s input/output, the model can return arbitrary many output samples (a time series) for a single input (an initial condition).
PyDMD

provides many variants of the Dynamic Mode Decomposition (DMD). datafold provides a wrapper to make models of PyDMD accessible. However, a limitation of PyDMD is that it only processes single coherent time series, see PyDMD issue 86. The DMD models that are directly included in datafold utilize the functionality of the data structure TSCDataFrame and can therefore process time series collections - in an extreme case only containing snapshot pairs.
PySINDy

specializes on a sparse system identification of nonlinear dynamical systems to infer governing equations.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.0.2

Mar 13, 2026

2.0.1

Oct 30, 2023

2.0.0

Jul 31, 2023

1.1.6

Dec 2, 2021

1.1.5

Jul 5, 2021

1.1.4

Apr 20, 2021

1.1.3

Mar 10, 2021

1.1.2

Sep 25, 2020

1.1.1

Aug 14, 2020

1.1.0 yanked

Aug 14, 2020

Reason this release was yanked:

misconfigured dependency in setup.py

1.0.2

Jul 13, 2020

1.0.1

Jun 29, 2020

1.0.0

May 20, 2020

0.1.1

Apr 21, 2020

0.1

Oct 17, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datafold-2.0.2.tar.gz (299.9 kB view details)

Uploaded Mar 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

datafold-2.0.2-py3-none-any.whl (319.2 kB view details)

Uploaded Mar 13, 2026 Python 3

File details

Details for the file datafold-2.0.2.tar.gz.

File metadata

Download URL: datafold-2.0.2.tar.gz
Upload date: Mar 13, 2026
Size: 299.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for datafold-2.0.2.tar.gz
Algorithm	Hash digest
SHA256	`1d8683a3c76aad4b46c90cafb5a89b03255f23ecb079b2fa6510d9a28fc1aabc`
MD5	`c015d92420fda12a4941eaacca42f347`
BLAKE2b-256	`d1fda90f6aeb062f51da66fcea4d1370c2ebc44fc0704aa172fa24f4de1460d1`

See more details on using hashes here.

File details

Details for the file datafold-2.0.2-py3-none-any.whl.

File metadata

Download URL: datafold-2.0.2-py3-none-any.whl
Upload date: Mar 13, 2026
Size: 319.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for datafold-2.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a54b424661a05944537aa0655c87dece6fef45fec9b8ed9a6a2a268ef647305b`
MD5	`802de6f01801223b05861a084325616c`
BLAKE2b-256	`29a8fd76bdaba4ef0218c7939ee9f885a478f09df1590a14a376af3464a3339f`

See more details on using hashes here.

datafold 2.0.2

Navigation

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

What is datafold?

Cite

How to get it?

1. From PyPI

2. From source

Contributing

Dependencies

How does it compare to other software?

Project details

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes