Skip to main content

No project description provided

Project description


RoughPy

RoughPy is a package for working with streaming data as rough paths, and working with algebraic objects such as free tensors, shuffle tensors, and elements of the free Lie algebra.

This library is currently in an alpha stage, and as such many features are still incomplete or not fully implemented. Please bear this in mind when looking at the source code.

Installation

RoughPy can be installed from PyPI using pip on Windows, Linux, and MacOS (Intel based Mac only, sorry not Apple Silicon support yet). Simply run

pip install roughpy

to get the latest version.

Alternatively, the wheel files can be downloaded from the Releases page.

Installing from source

RoughPy can be installed from source, although this is not the recommended way to install. The build system requires vcpkg in order to obtain the necessary dependencies (except for MKL on x86 platforms, which is installed via pip). You will need to make sure that vcpkg is available on your system before attempting to build RoughPy. The following commands should be sufficient to set up the environment for building RoughPy:

git clone https://github.com/Microsoft/vcpkg.git tools/vcpkg
tools/vcpkg/bootstrap-vcpkg.sh
export CMAKE_TOOLCHAIN_FILE=$(pwd)/tools/vcpkg/scripts/buildsystems/vcpkg.cmake

With this environment variable set, most of the dependencies will be installed automatically during the build process.

On MacOS with Apple Silicon you will need to install libomp (for example using Homebrew brew install libomp). This is not necessary on Intel based MacOS where the Intel iomp5 can be used instead. The build system will use brew --prefix libomp to try to locate this library. (The actual brew executable can be customised by setting the ROUGHPY_BREW_EXECUTABLE CMake variable or environment variable.)

You should now be able to pip install either using the PyPI source distribution (using the --no-binary :roughpy: flag), or directly from GitHub (recommended):

pip install git+https://github.com/datasig-ac-uk/RoughPy.git

It will take some time to build.

Intervals in RoughPy

RoughPy is very careful in how it works with intervals. One design goal is that it should be able to handle jumps in the underlying signal that occur at particular times, including the beginning or end of the interval, and still guarantee that if you combine the signature over adjacent interval, you always get the signature over the entire interval. This implies that there has to be a decision about whether data at the exact beginning or exact end of the interval is included. The convention in RoughPy are that we use clopen intervals, and that data at beginning of the interval is seen, and data at the end of the interval is seen in the next interval. A second design goal is that the code should be efficient, and so the internal representation of a stream involves caching the signature over dyadic intervals of different resolutions. Recovering the signature over any interval using the cache has logarithmic complexity (using at most 2n tensor multiplications, when n is the internal resolution of the stream). Resolution refers to the length of the finest granularity at which we will store information about the underlying data. Any event occurs within one of these finest granularity intervals, multiple events occur within the same interval resolve to a more complex log-signature which correctly reflects the time sequence of the events within this grain of time. However, no query of the stream is allowed to see finer resolution than the internal resolution of the stream, it is only allowed to access the information over intervals that are a union of these finest resolution granular intervals. For this reason, a query over any interval is replaced by a query is replaced by a query over an interval whose endpoints have been shifted to be consistent with the granular resolution, obtained by rounding these points to the contained end-point of the unique clopen granular interval containing this point. In particular, if both the left-hand and right-hand ends of the interval are contained in the clopen granular interval, we round the interval to the empty interval. Specifying a resolution of 32 or 64 equates to using integer arithmetic.

Usage

Following the NumPy (and related) convention, we import RoughPy under the alias rp as follows:

import roughpy as rp

The main object(s) that you will interact with are Stream objects or the family of factory classes such as LieIncrementStream. For example, we can create a LieIncrementStream using the following commands:

import numpy as np
stream = rp.LieIncrementStream.from_increments(np.array([[0, 1, 2], [3, 4, 5]], dtype=np.float64), depth=2)

This will create a stream whose (hidden) underlying data are the two increments [0., 1., 2.] and [3., 4., 5.], and whose algebra elements are truncated at maximum depth 2. To compute the log signature over an interval we use the log_signature method on the stream, for example

interval = rp.RealInterval(0., 1.)
lsig = stream.log_signature(interval)

Printing this new object lsig should give the following result

{ 1(2) 2(3) }

which is the first increment from the underlying data. (By default, the increments are assumed to occur at parameter values equal to their row index in the provided data.)

Similarly, the signature can be computed using the signature method on the stream object:

sig = stream.signature(interval)

Notice that the lsig and sig objects have types Lie and FreeTensor, respectively. They behave exactly as you would expect elements of these algebras to behave. Moreover, they will (usually) be convertible directly to a NumPy array (or TensorFlow, PyTorch, JAX tensor type in the future) of the underlying data, so you can interact with them as if they were simple arrays.

We can also construct streams by providing the raw data of Lie increments with higher order terms by specifying the width using the same constructor above. For example, if we take width 2 and depth 2 then the elements of a Lie element will have keys (1, 2, [1,2]). So if we provide the following data, we construct a stream whose underlying Lie increments are width 2, depth 2

stream = rp.LieIncrementStream.from_increments(np.array([[1, 2, 0.5], [0.2, -0.1, 0.2]], dtype=np.float64), width=2, depth=2)
print(stream.log_signature(rp.RealInterval(0, 0.5), 2)) # returns the first increment
# { 1(1), 2(2), 0.5([1,2]) }
>>> print(stream.log_signature(rp.RealInterval(0, 1.5), 2)) # Campbell-Baker-Hausdorff product of both increments
{ 1.2(1) 1.9(2) 0.45([1,2]) }

Support

If you have a specific problem, the best way to record this is to open an issue on GitHub. We welcome any feedback or bug reports.

Contributing

In the future, we will welcome pull requests to implement new features, fix bugs, add documentation or examples, or add tests to the project. However, at present, we do not have robust CI pipelines set up to rigorously test incoming changes, and therefor will not be accepting pull requests made from outside the current team.

Contributors

The full list of contributors is listed in THANKS alongside this readme. The people mentioned in this document constitute The RoughPy Developers.

License

RoughPy is licensed under a BSD-3-Clause license. This was chosen specifically to match the license of NumPy.

Changelog

Version 0.1.1:

  • Fixed type promotions in scalar arithmetic - left hand types are now promoted when appropriate.
  • Added "tensor_functions" module for implementing additional functions on tensors. Currently only Log is implemented.
  • Fixed a few bugs in the build system

Version 0.1.0:

  • Added framework for integrating device support and redesigned scalars module to accommodate the changes.
  • Made changes to type deduction in constructors to avoid exceptions when providing lists of python ints/floats.
  • Changed the implementation of array for algebra types. A copy is always made, and the size of the array is always equal to the dimension of the chosen width/depth composition.
  • Changed the behaviour when no resolution is given. RoughPy now uses a heuristic to compute a suitable resolution for computing signatures if none is given.

Version 0.0.8:

  • Disabled linking to BLAS/LAPACK to reduce compile times whilst under development.
  • Greatly expanded the serialization support internally.
  • Many classes can now be pickled/unpickled for transport across process boundaries. (Note that this is not yet a safe means of storing and transmitting stream data.)
  • Overlay triplets are now forced for all builds. This should improve reproducibility.
  • Restructured the algebra module to improve build times.
  • Added CMake detection of headers to help builds with non-compliant compilers.
  • Fixed an error in the construction of PyLieKey in PyLieKeyIterator. #40

Version 0.0.7:

  • Overhaul the (internal) ScalarType API: . the overloads of convert_copy have been removed in favour of the variant that takes two ScalarPointers; . the single versions of add(_inplace) and friends have been replaced with more flexible add_into; batch compute methods and friends; . replaced single value uminus with uminus into with similar signature to to add_into and friends; . removed single value copy method;
  • Added constructor for ScalarPointer from type_id and pointer.
  • Implementations of ScalarType methods that are essentially the same for all types are implemented in a common implementation layer.
  • Added threading support in platform
  • add_into and friends have threading support if available and enabled.
  • Added default implementation of type_id_of so that non-specialized types look for a ScalarType object.
  • Greatly simplified the design of ScalarMatrix - it now only supports full, dense matrices.
  • Redesigned the interface between the Scalar linear algebra and MKL/BLAS+LAPACK.
  • Added function to query ring characteristics of a ScalarType - currently unused.
  • Added KeyScalarStream for constructing streams from array-like data more easily.
  • Implemented the from_type_details function for scalar types. This fixes a bug when constructing objects using the dlpack protocol.
  • Overhaul constructor for LieIncrementStreams from increment data to reduce number of copies (if possible) and to handle non-contiguous or oddly shaped data correctly.
  • Change implementation of LieIncrementStream to allow adding the parameter channel during construction.
  • Change implementation of TickStream to allow adding parameter channel during construction.

Version 0.0.6:

  • Externally sourced streams (sound-file streams) now support setting channel types/schema in factory function.
  • Added option for customising file name macro in exception throws.
  • Made some improvements to the internal interface of ScalarType to allow more efficient implementations of vectorised operations.
  • Added fast "is_zero" method to Python algebra objects.

Version 0.0.5:

  • Added free functions for performing free-tensor, shuffle, half-shuffle multiplication between pairs of tensors (of either kind).
  • Added free function for applying the adjoint of left free tensor multiplication to arbitrary tensors.
  • Improved exception usage, messages now include filename, lineno, and function name to help locate c++ exceptions passed through to Python.
  • Basis objects in Python are now iterable.
  • Added split_n and to_index methods to Tensor key.

Version 0.0.4:

  • Overhauled the RPY_CHECK macro so it now gives much better contextual information.
  • Readme updated to reflect PyPI installation with wheels.
  • Antipode is now implemented in libalgebra_lite, and exposed to Python for Free tensors.
  • Streams now carry a support interval, outside of which the signature will be return trivial.
  • Build requirements fixed for non x86 platforms.
  • Expanded coverage of pytype stub file.

Version 0.0.3:

  • Added datetime interval support.
  • Integrated schema reparametrisation into stream signature methods.
  • Fixed bug in names of scalar types, names now display correctly.
  • Fixed bfloat16 given wrong scalar type code.
  • Added half precision float and bfloat16 to py module
  • Added real partition class and Python interface.
  • Implemented simplify method on Streams.
  • Added polynomial coefficients
  • started an examples folder

Version 0.0.2-alpha:

  • Added datetime and timedelta object support to tick data parsing.
  • Expanded build system to include more versions of Python.
  • Stabilised compilation on all three platforms.
  • Fixed numerous bugs in the build system.

Version 0.0.1-alpha:

  • First alpha release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

roughpy-0.1.1.tar.gz (986.7 kB view hashes)

Uploaded Source

Built Distributions

roughpy-0.1.1-cp312-cp312-win_amd64.whl (2.9 MB view hashes)

Uploaded CPython 3.12 Windows x86-64

roughpy-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.9 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

roughpy-0.1.1-cp312-cp312-macosx_11_0_x86_64.whl (6.3 MB view hashes)

Uploaded CPython 3.12 macOS 11.0+ x86-64

roughpy-0.1.1-cp311-cp311-win_amd64.whl (2.9 MB view hashes)

Uploaded CPython 3.11 Windows x86-64

roughpy-0.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.9 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

roughpy-0.1.1-cp311-cp311-macosx_11_0_x86_64.whl (6.3 MB view hashes)

Uploaded CPython 3.11 macOS 11.0+ x86-64

roughpy-0.1.1-cp310-cp310-win_amd64.whl (2.9 MB view hashes)

Uploaded CPython 3.10 Windows x86-64

roughpy-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.9 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

roughpy-0.1.1-cp310-cp310-macosx_11_0_x86_64.whl (6.3 MB view hashes)

Uploaded CPython 3.10 macOS 11.0+ x86-64

roughpy-0.1.1-cp39-cp39-win_amd64.whl (2.9 MB view hashes)

Uploaded CPython 3.9 Windows x86-64

roughpy-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.9 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

roughpy-0.1.1-cp38-cp38-win_amd64.whl (3.0 MB view hashes)

Uploaded CPython 3.8 Windows x86-64

roughpy-0.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.9 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page