Skip to main content

Annotated multivariate observation of timestamped data

Project description

🗂 VData

VData is used for storing and manipulating multivariate observations of timestamped data.

The VData structure

It extends the AnnData object by adding the time dimension.

Example : The VData object allows to efficiently store information about cells (observations), whose gene expression (variables) is measured over multiple time points. It is build around layers (.layers). Each layer is a 3D matrix of : obs x var x time points. Around those layers, DataFrames allow to describe variables and time points, while custom TemporalDataFrames describe observations.

The uns dictionnary is used to store additional unstructure data.

More generally, VData objects can be used to store any timestamped datasets where annotation of observations and variables is required.

🌟 Features

  • complete Python reimplementation based on h5py
  • very fast loading of any dataset
  • memory-efficient data manipulation (<1GB) even for datasets of hundreds of GB.
  • explicit handling of timestamped data, especially suited for simulated single-cell datesets
  • complete compatibility with the scverse ecosystem

👁 Overview

General

The vdata library exposes the actual VData object alongside with the TemporalDataFrame object which extends the common pandas.DataFrame to a third time axis.

VData objects can be created from in-RAM objects such as AnnData, TemporalDataFrame, pandas.DataFrame or mappings of <layer name>:DataFrame.

It is also possible to load data from a VData or an AnnData saved as a hdf5 website file or in csv format.

🔵 Note An important distinction with AnnData is that when a VData is backed on (read from) an hdf5 file, the whole object is only loaded on-demand and by small chunks of data. As a result, VData objects will always consume small amounts of RAM and will be very fast to read.

Layers and data annotation

The bulk of the data is stored in TemporalDataFrames, themselves stacked up in the layers dictionnary. Data is thus represented as observations x variables x time points dataframes. Observation indices can either be unique at each time point or strictly the same (e.g. to store simulated data where a single cell can be recorded multiple times).

TemporalDataFrames, one with unique observations and one with identical observations at all timepoints

Three additional dataframes are used for annotating the observations (obs), variables (var) and timepoints (timepoints).

Multi-dimension annotation

There are two additional mappings for storing multi-dimensional annotations (i.e. that require more than one column to be stored). These are the obsm and varm mappings, which respectively contain TemporalDataFrames and pandas DataFrames.

🟢 Example You can store PCA or UMAP coordinates in obsm.

Pairwise annotation

The last two mappings (obsp and varp) contain pariwise annotations : data in square matrices of obs x obs or var x var.

🟢 Example You can store distance values between observations in obsp.

📀 Installation

VData requires Python 3.9+

pip installation (stable)

pip install vdata

using git (latest)

git clone git@github.com:Vidium/vdata.git

📑 Documentation

See the complete documentation at [INCOMING].

Read the VData article at https://www.biorxiv.org/content/10.1101/2023.08.29.555297

🖋 Citation

You can cite the VData pre-print as :

VData: Temporally annotated data manipulation and storage

Matteo Bouvier, Arnaud Bonnaffoux

bioRxiv 2023.08.29.555297; doi: https://doi.org/10.1101/2023.08.29.555297

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vdata-0.2.3.tar.gz (78.8 kB view details)

Uploaded Source

Built Distribution

vdata-0.2.3-py3-none-any.whl (104.0 kB view details)

Uploaded Python 3

File details

Details for the file vdata-0.2.3.tar.gz.

File metadata

  • Download URL: vdata-0.2.3.tar.gz
  • Upload date:
  • Size: 78.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.12 Linux/5.15.0-116-generic

File hashes

Hashes for vdata-0.2.3.tar.gz
Algorithm Hash digest
SHA256 a9a2cb1961c3762d461b2af4a2d0fa010b503e43b3fe979f1b85b7a9999fbee1
MD5 06457929b62e029389d1aba2304953d5
BLAKE2b-256 d32d406a8c5f4f72a3d78d9a0f1db49adb645e0bdd704bf3dce98503f1664d1c

See more details on using hashes here.

File details

Details for the file vdata-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: vdata-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 104.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.12 Linux/5.15.0-116-generic

File hashes

Hashes for vdata-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1256a118dcc8521ac0ddaadc06f756481f1143198277e81aa7226de441029080
MD5 eae97c93654a9c04ecdb5f9321b68c2c
BLAKE2b-256 6559c36d68f46680910a0869b873b23739018dc9700befb8f59d8c2835c20f9a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page