Skip to main content

Manifold Dimensional Expansion

Project description

Manifold Dimensional Expansion (MDE)


Manifold dimensional expansion is a causal discovery and dimensionality reduction technique designed to identify low dimensional maximally predictive observables of a dynamical system with multivariate observations.

The algorithm is based on a greedy implementation of the generalized Takens embedding theorem. However, instead of using time delays for dimensionality expansion, observables that improve the forecast skill of a target variable are added until no further improvement can be achieved. The default predictor is the simplex function in pyEDM providing a fully nonlinear predictor from Empirical Dynamic Modeling (EDM).

Specifically, given a target observable, scan all other observables to find the best 1-D predictor of the target, ensuring the predictor has causal inference with the target. With this 1-D vector scan all remaining observables to find the 2-D embedding with best predictability and causal inference. This greedy algorithm is iterated up to the point that no further prediction skill improvement can be produced.

Causal inference is performed by default with Convergent Cross Mapping (CCM) ensuring the added observable is part of the dynamical system of the interrogated time series. The embedding dimension needed for CCM is automatically determined if parameter E=0, the default. Otherwise the specifed value of E is used. To account for unobserved variables time delay vectors of the top observables can be added.

Output is a DataFrame with a ranked list of observation vectors and predictive skill satisfying MDE criteria for the target variable.


Installation

python -m pip install dimx


Usage

MDE is an object-oriented class implementation with command line interface (CLI) support. CLI parameters are configured through command line arguments, MDE class arguments through the MDE class constuctor.

MDE can be imported as a module and executed with dimx.Run() or from the command line with theManifoldDimExpand.py executable as shown below.

CLI example:

./ManifoldDimExpand.py -d ../data/Fly80XY_norm_1061.csv 
-rc index FWD Left_Right -D 10 -t FWD -l 1 300 -p 301 600
-C 10 -ccs 0.01 -emin 0.5 -P -title "MDE FWD" -v

MDE class constructor API example:

from dimx import MDE
from pandas import read_csv

df = read_csv( './data/Fly80XY_norm_1061.csv' )

mde = MDE( df, target = 'FWD', 
           removeColumns = ['index','FWD','Left_Right'], 
           D = 10, lib = [1,300], pred = [301,600], ccmSeed = 12345,
           cores = 10, plot = True, title = "MDE FWD" )

mde.Run()

mde.MDEOut
  variables       rho
0      TS33  0.652844
1       TS4  0.792290
2      TS17  0.823024
3      TS71  0.840094
4      TS44  0.840958
5      TS37  0.845765
6       TS9  0.846601
7      TS30  0.859614
8      TS47  0.860541
9      TS67  0.860230

Real World Example

This example finds optimal observables and estimates the dimension of neural data from Drosophila melanogaster.

A fly expressing the calcium indicator GCaMP6f as a measure of neuronal activity was recorded walking on a Styrofoam ball. Neuronal activity across the brain was spatially segmented by independent component analysis (ICA) yielding 80 time series of neural activity from the component brain areas. Two behavioral variables were simultaneously recorded: forward speed (FWD) and left/right turning speed (Left_Right) Aimon, S. et al. 2019. A Jupyter Lab notebook is available at MDE_Fly_Example.

Import MDE and Evaluate classes

from dimx import MDE, Evaluate

Load data

from pandas import read_csv
df = read_csv( './data/Fly80XY_norm_1061.csv' )

Instantiate and Run MDE class objects for FWD & Left_Right targets

We use the first 300 time series rows to create the EDM library, and perform out-of-sample prediction on time series rows 301-600. These indices are Not zero offset.

ccmSlope is the minimum slope of a linear fit to the CCM rho(L) curve to validate a causal driver. L is the vector of CCM library sizes at which CCM is evaluated. Default values for L are percentiles [10,15,85,90] of the number of observations (rows).

Fly_FWD = MDE( df,                     # Pandas DataFrame of observables
               target = 'FWD',         # target behavior variable
               removeColumns = ['index','FWD','Left_Right'], # variables to ignore
               D    = 12,              # Max number of dimensions
               lib  = [1,300],         # EDM library start,stop indices
               pred = [301,600],       # EDM prediction start,stop indices
               ccmSlope = 0.01,        # CCM convergence criteria
               embedDimRhoMin = 0.65,  # Minimum rho for CCM embedding dimension
               crossMapRhoMin = 0.5,   # Minumum rho for cross map of target : variables
               cores = 10,             # Number of cores in CrossMapColumns()
               chunksize = 30,
               plot = False )
Fly_FWD.Run()
Fly_LR = MDE( df,
              target = 'Left_Right',
              removeColumns = ['index','FWD','Left_Right'], 
              D    = 12,
              lib  = [1,600],
              pred = [801,1000], 
              ccmSlope = 0.01,
              embedDimRhoMin = 0.2,
              crossMapRhoMin = 0.05,
              cores = 10,
              chunksize = 30,
              plot = False )
Fly_LR.Run()

The FWD behavior suggests a dimension of D=5 observables is an appropriate low-dimensional set of obervables to predict FWD movement. MDE Fly


Evaluate MDE components & compare to PCA & Diffusion Map

Here we compare out-of-sample prediction of FWD behavior with the 5 MDE identified observables as well as 5 component PCA and Diffusion Map.

Fly_FWD_Eval = Evaluate( df, 
                         columns_range = [1,81], # 0-offset range of columns for PCA, DMap
                         mde_columns = ['TS33', 'TS4', 'TS8', 'TS9', 'TS32'],
                         predictVar = 'FWD',
                         library    = [1, 300],   # index start,stop of observations for library
                         prediction = [301, 600], # index start,stop of predictions 
                         components = 5,          # Number of PCA & DMap components
                         dmap_k     = 15,         # diffusion_map k nearest neighbors
                         figsize    = (8,6) )
Fly_FWD_Eval.Run()
Fly_FWD_Eval.Plot()

The MDE prediction has the lowest CAE (cumulative absolute error) to the out-of-sample observations. The diffusion map compoents are latent (not observable) and do not correspond in an obvious way to observed neural dynamics. The PCA prediction lumps the majority of the variance into a single component based on a linear decomposition. Both PCA and diffusion map predict activity during times when no FWD movement is observed, while MDE does not. Crucially, MDE predictions are not latent, but actual observables of the system.

MDE Evaluate


Parameters

MDE parameters are defined in the MDE constructor.

Parameter Default Description
dataFrame None Pandas DataFrame : column observation vectors, row observations
dataFile None Data file name to load
dataName None Data name in .npz archive
removeTime False Remove first column from dataFrame
noTime False First column of dataFrame is not time vector
columnNames [] dataFrame columns to process
initDataColumns [] If reading .npz omit these leading columns
removeColumns [] Columns to ignore
D 3 Maximum number of MDE dimensions
target None Target variable
lib [] EDM library indices. Default to all rows
pred [] EDM prediction indices. Default to all rows
Tp 1 Prediction time interval
tau -1 CCM embedding time delay
exclusionRadius 0 CCM library temporal exlcusion radius
sample 20 CCM random library samples to average
pLibSizes [10, 15, 85, 90] Percentiles of CCM library sizes
noCCM False Disable CCM
ccmSlope 0.01 Slope of CCM(LibSizes) convergence
ccmSeed None CCM random seed
E 0 CCM embedding dimension. If 0 compute automatically
crossMapRhoMin 0.5 Minimum rho for cross map acceptance
embedDimRhoMin 0.5 Minimum rho for CCM embedding dimension
maxE 15 Maximum embedding dimension for CCM
firstEMax False CCM embedding dimension is first local peak in rho(E)
timeDelay 0 Add N=timeDelay time delays
cores 5 number of multiprocessing CPU in CrossMapColumns()
mpMethod None multiprocessing context method in CrossMapColumns()
chunksize 1 multiprocessing chunksize in CrossMapColumns()
outDir None Output file directory
outFile None MDE object pickle file
outCSV None CSV of MDE output
logFile None Log file
consoleOut True Echo output to console
verbose False Verbose mode
debug False Debug mode
plot False Plot MDE result
title None Plot title
args None ArgumentParser object from CLI_Parser

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dimx-1.0.5.tar.gz (627.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dimx-1.0.5-py3-none-any.whl (629.2 kB view details)

Uploaded Python 3

File details

Details for the file dimx-1.0.5.tar.gz.

File metadata

  • Download URL: dimx-1.0.5.tar.gz
  • Upload date:
  • Size: 627.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.6

File hashes

Hashes for dimx-1.0.5.tar.gz
Algorithm Hash digest
SHA256 133270a059f51f839259c26cd69aec4c5f60c750090d6e17228e59b2e7b632bb
MD5 33e47b43f49ffa049b6aa1fb62caada1
BLAKE2b-256 0b07e6079b2e24ac28775726255a53746bbcf9eaf44274c946375efd9bc72e1c

See more details on using hashes here.

File details

Details for the file dimx-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: dimx-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 629.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.6

File hashes

Hashes for dimx-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 5f44bf82a4fbba8696056a2b63cd8b4b8764414a5d91a62f3d5d99a071a0c763
MD5 40e9f32e931357a15e15dc71ad12de5b
BLAKE2b-256 4e0558b14e2e83867ab925b83fe554cb3aa9a3edb891197f0342416fc43b6f6f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page