No project description provided
Project description
Source repository | Contributing and feedback | PyPI | Documentation | Tutorials | Scientific literature
What is datafold?
datafold is a MIT-licensed Python package containing operator-theoretic, data-driven models to identify dynamical systems from time series data and to infer geometrical structures in point clouds.
The package includes:
Data structures to handle point clouds on manifolds (PCManifold) and time series collections (TSCDataFrame). The data structures are used both internally and for model input/outputs. In contrast to solutions of found in other projects, resorting to lists of Numpy arrays, TSCDataFrame makes it much easier to describe collected time series data by storing the data in a single object.
An efficient implementation of the DiffusionMaps model to infer geometric meaningful structures from data, such as the eigenfunctions of the Laplace-Beltrami operator. As a distinguishing factor to other implementations, the model can handle a sparse kernel matrix and allows setting an arbitrary kernel, including the standard Gaussian kernel, continuous k-nearest neighbor kernel, or dynamics-adapted cone kernel.
Out-of-sample extensions for the Diffusion Maps model, such as the (auto-tuned) Laplacian Pyramids or Geometric Harmonics to interpolate general function values on a point cloud manifold.
An implementation of the (Extended-) Dynamic Mode Decomposition (e.g. model DMDFull or EDMD) as a data-driven model to identify dynamical systems from time series collection data. The EDMD model subclasses from flexible scikit-learn Pipeline, which allows setting up and transforming time series collection data to a more suitable feature state (cf. Koopman operator theory). Two interesting transformations are the Diffusion Maps and time-delay embedding series for phase space reconstruction.
EDMDCV allows model parameters to be optimized with cross-validation splittings that account for the temporal order in time series collections.
See also this introduction page. For a mathematical thorough introduction, we refer to the scientific literature.
Cite
If you use datafold in your research, please cite this paper published in the Journal of Open Source Software (JOSS).
Lehmberg et al., (2020). datafold: data-driven models for point clouds and time series on manifolds. Journal of Open Source Software, 5(51), 2283, https://doi.org/10.21105/joss.02283
BibTeX:
@article{Lehmberg2020,
doi = {10.21105/joss.02283},
url = {https://doi.org/10.21105/joss.02283},
year = {2020},
publisher = {The Open Journal},
volume = {5},
number = {51},
pages = {2283},
author = {Daniel Lehmberg and Felix Dietrich and Gerta K{\"o}ster and Hans-Joachim Bungartz},
title = {datafold: data-driven models for point clouds and time series on manifolds},
journal = {Journal of Open Source Software}}
How to get it?
Installation requires Python>=3.7 with pip and setuptools installed. Both packages usually ship with a standard Python installation. The package dependencies install automatically, the main dependencies are listed below in “Dependencies”.
There are two ways to install datafold:
1. From PyPI
This is the standard way for users. The package is hosted on the official Python package index (PyPI) and installs the core package (excluding tutorials and tests). The tutorial files can be downloaded separately here.
To install the package and its dependencies with pip
, run
python -m pip install datafold
2. From source
This way is recommended if you want to access the latest (but potentially unstable) development, run tests or wish to contribute (see section “Contributing” for details). Download or git-clone the source code repository.
Download the repository
Install the package from the downloaded repository
python -m pip install .
Contributing
Any contribution (code/tutorials/documentation improvements), question or feedback is very welcome. Either use the issue tracker or Email. Instructions to set up datafold for development can be found here.
Dependencies
The dependencies of the core package are managed in the file requirements.txt and install with datafold. The tests, tutorials, documentation and code analysis require additional dependencies which are managed in requirements-dev.txt.
datafold integrates with common packages from the Python scientific computing stack:
- pandas
datafold uses pandas’ DataFrame as a base class for TSCDataFrame, which captures time series data and collections thereof. The data structure indexes time, time series ID and one-or-many spatial features. It includes specific time series collection functionality and is compatible with pandas rich functionality.
- scikit-learn
All datafold algorithms that are part of the “machine learning pipeline” align to the scikit-learn API. This is done by deriving the models from BaseEstimator. and appropriate MixIns. datafold defines own MixIns that align with the API in a duck-typing fashion to allow identifying dynamical systems from temporal data in TSCDataFrame.
- SciPy
The package is used for elementary numerical algorithms and data structures in conjunction with NumPy. This includes (sparse) linear least square regression, (sparse) eigenpairs solver and sparse matrices as optional data structure for kernel matrices.
How does it compare to other software?
The selection only includes other Python packages.
- scikit-learn
provides algorithms and models along the entire machine learning pipeline, with a strong focus on static data (i.e. without temporal context). datafold integrates into scikit-learn’ API and all data-driven models are subclasses of BaseEstimator. An important contribution of datafold is the DiffusionMaps model as popular framework for manifold learning, which is not contained in scikit-learn’s set of algorithms. Furthermore, datafold includes dynamical systems as a new model class that is operable with scikit-learn - the attributes align to supervised learning tasks. The key differences are that a model processes data of type TSCDataFrame and instead of a one-to-one relation in the model’s input/output, the model can return arbitrary many output samples (a time series) for a single input (an initial condition).
- PyDMD
provides many variants of the Dynamic Mode Decomposition (DMD). datafold provides a wrapper to make models of PyDMD accessible. However, a limitation of PyDMD is that it only processes single coherent time series, see PyDMD issue 86. The DMD models that are directly included in datafold utilize the functionality of the data structure TSCDataFrame and can therefore process time series collections - in an extreme case only containing snapshot pairs.
- PySINDy
specializes on a sparse system identification of nonlinear dynamical systems to infer governing equations.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file datafold-1.1.5.tar.gz
.
File metadata
- Download URL: datafold-1.1.5.tar.gz
- Upload date:
- Size: 159.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.7.2 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f4ac29ad9d07ec979fd2be5207e517fb5a9bbf76ffe2c81a78031b8e80b97ce |
|
MD5 | ceb3896ef3ce33b76b29968b353bffad |
|
BLAKE2b-256 | 6151e1cd645a04fa378990dca136278e0297033523ade4de7cbd0bc9cc36c9ec |
File details
Details for the file datafold-1.1.5-py3-none-any.whl
.
File metadata
- Download URL: datafold-1.1.5-py3-none-any.whl
- Upload date:
- Size: 169.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.7.2 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4e896251444a6169d64400621ce877c758942665755f319bfe9cf1ac0744e310 |
|
MD5 | cf2edc4ccadba81a991f6a83c0f8e0c3 |
|
BLAKE2b-256 | 737e39c8a42f1278a4f531f658450e5b3b0a9560ecff31ae27f09b6e6f342aee |