Skip to main content

Annotated multivariate observation of timestamped data

Project description

🗂 VData

VData is used for storing and manipulating multivariate observations of timestamped data.

The VData structure

It extends the AnnData object by adding the time dimension.

Example : The VData object allows to efficiently store information about cells (observations), whose gene expression (variables) is measured over multiple time points. It is build around layers (.layers). Each layer is a 3D matrix of : obs x var x time points. Around those layers, DataFrames allow to describe variables and time points, while custom TemporalDataFrames describe observations.

The uns dictionnary is used to store additional unstructure data.

More generally, VData objects can be used to store any timestamped datasets where annotation of observations and variables is required.

🌟 Features

  • complete Python reimplementation based on h5py
  • very fast loading of any dataset
  • memory-efficient data manipulation (<1GB) even for datasets of hundreds of GB.
  • explicit handling of timestamped data, especially suited for simulated single-cell datesets
  • complete compatibility with the scverse ecosystem

👁 Overview

General

The vdata library exposes the actual VData object alongside with the TemporalDataFrame object which extends the common pandas.DataFrame to a third time axis.

VData objects can be created from in-RAM objects such as AnnData, TemporalDataFrame, pandas.DataFrame or mappings of <layer name>:DataFrame.

It is also possible to load data from a VData or an AnnData saved as a hdf5 website file or in csv format.

🔵 Note An important distinction with AnnData is that when a VData is backed on (read from) an hdf5 file, the whole object is only loaded on-demand and by small chunks of data. As a result, VData objects will always consume small amounts of RAM and will be very fast to read.

Layers and data annotation

The bulk of the data is stored in TemporalDataFrames, themselves stacked up in the layers dictionnary. Data is thus represented as observations x variables x time points dataframes. Observation indices can either be unique at each time point or strictly the same (e.g. to store simulated data where a single cell can be recorded multiple times).

TemporalDataFrames, one with unique observations and one with identical observations at all timepoints

Three additional dataframes are used for annotating the observations (obs), variables (var) and timepoints (timepoints).

Multi-dimension annotation

There are two additional mappings for storing multi-dimensional annotations (i.e. that require more than one column to be stored). These are the obsm and varm mappings, which respectively contain TemporalDataFrames and pandas DataFrames.

🟢 Example You can store PCA or UMAP coordinates in obsm.

Pairwise annotation

The last two mappings (obsp and varp) contain pariwise annotations : data in square matrices of obs x obs or var x var.

🟢 Example You can store distance values between observations in obsp.

📀 Installation

VData requires Python 3.9+

pip installation (stable)

pip install vdata

using git (latest)

git clone git@github.com:Vidium/vdata.git

📑 Documentation

See the complete documentation at [INCOMING].

Read the VData article at https://www.biorxiv.org/content/10.1101/2023.08.29.555297

🖋 Citation

You can cite the VData pre-print as :

VData: Temporally annotated data manipulation and storage

Matteo Bouvier, Arnaud Bonnaffoux

bioRxiv 2023.08.29.555297; doi: https://doi.org/10.1101/2023.08.29.555297

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vdata-0.3.8.tar.gz (83.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vdata-0.3.8-py3-none-any.whl (114.9 kB view details)

Uploaded Python 3

File details

Details for the file vdata-0.3.8.tar.gz.

File metadata

  • Download URL: vdata-0.3.8.tar.gz
  • Upload date:
  • Size: 83.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Manjaro Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vdata-0.3.8.tar.gz
Algorithm Hash digest
SHA256 5cabf9a3df0aad904229f3d1e2f2a8a4bd54cf99e0bcf7fa1fbb41b1e50ed818
MD5 98c5d280f6586537c09114829709c7c8
BLAKE2b-256 0f11381a6c4efaa48fe15fd28a7169a8dee4683d4fafc33cb021000b17ef1e5f

See more details on using hashes here.

File details

Details for the file vdata-0.3.8-py3-none-any.whl.

File metadata

  • Download URL: vdata-0.3.8-py3-none-any.whl
  • Upload date:
  • Size: 114.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Manjaro Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vdata-0.3.8-py3-none-any.whl
Algorithm Hash digest
SHA256 47fcc97dbda450116aa49dacd30b5ab9d50457e370dd803dec41498d7dba095c
MD5 cbe6d6e5f7207fe81b32125150c8c375
BLAKE2b-256 98257c893141b23b5d9eea19622219a4e456cbe092e1d897676514b82286cefb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page