Skip to main content

Annotated multivariate observation of timestamped data

Project description

🗂 VData

VData is used for storing and manipulating multivariate observations of timestamped data.

The VData structure

It extends the AnnData object by adding the time dimension.

Example : The VData object allows to efficiently store information about cells (observations), whose gene expression (variables) is measured over multiple time points. It is build around layers (.layers). Each layer is a 3D matrix of : obs x var x time points. Around those layers, DataFrames allow to describe variables and time points, while custom TemporalDataFrames describe observations.

The uns dictionnary is used to store additional unstructure data.

More generally, VData objects can be used to store any timestamped datasets where annotation of observations and variables is required.

🌟 Features

  • complete Python reimplementation based on h5py
  • very fast loading of any dataset
  • memory-efficient data manipulation (<1GB) even for datasets of hundreds of GB.
  • explicit handling of timestamped data, especially suited for simulated single-cell datesets
  • complete compatibility with the scverse ecosystem

👁 Overview

General

The vdata library exposes the actual VData object alongside with the TemporalDataFrame object which extends the common pandas.DataFrame to a third time axis.

VData objects can be created from in-RAM objects such as AnnData, TemporalDataFrame, pandas.DataFrame or mappings of <layer name>:DataFrame.

It is also possible to load data from a VData or an AnnData saved as a hdf5 website file or in csv format.

🔵 Note An important distinction with AnnData is that when a VData is backed on (read from) an hdf5 file, the whole object is only loaded on-demand and by small chunks of data. As a result, VData objects will always consume small amounts of RAM and will be very fast to read.

Layers and data annotation

The bulk of the data is stored in TemporalDataFrames, themselves stacked up in the layers dictionnary. Data is thus represented as observations x variables x time points dataframes. Observation indices can either be unique at each time point or strictly the same (e.g. to store simulated data where a single cell can be recorded multiple times).

TemporalDataFrames, one with unique observations and one with identical observations at all timepoints

Three additional dataframes are used for annotating the observations (obs), variables (var) and timepoints (timepoints).

Multi-dimension annotation

There are two additional mappings for storing multi-dimensional annotations (i.e. that require more than one column to be stored). These are the obsm and varm mappings, which respectively contain TemporalDataFrames and pandas DataFrames.

🟢 Example You can store PCA or UMAP coordinates in obsm.

Pairwise annotation

The last two mappings (obsp and varp) contain pariwise annotations : data in square matrices of obs x obs or var x var.

🟢 Example You can store distance values between observations in obsp.

📀 Installation

VData requires Python 3.9+

pip installation (stable)

pip install vdata

using git (latest)

git clone git@github.com:Vidium/vdata.git

📑 Documentation

See the complete documentation at [INCOMING].

Read the VData article at https://www.biorxiv.org/content/10.1101/2023.08.29.555297

🖋 Citation

You can cite the VData pre-print as :

VData: Temporally annotated data manipulation and storage

Matteo Bouvier, Arnaud Bonnaffoux

bioRxiv 2023.08.29.555297; doi: https://doi.org/10.1101/2023.08.29.555297

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vdata-0.3.6.tar.gz (81.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vdata-0.3.6-py3-none-any.whl (113.5 kB view details)

Uploaded Python 3

File details

Details for the file vdata-0.3.6.tar.gz.

File metadata

  • Download URL: vdata-0.3.6.tar.gz
  • Upload date:
  • Size: 81.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Manjaro Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vdata-0.3.6.tar.gz
Algorithm Hash digest
SHA256 ec168263e2ea6d6e4a34bafc20f8d3b015ae380bcf7bb46828885da21eba8491
MD5 92a3dd1a16a8add9a88363a64f7eac85
BLAKE2b-256 7a7c1a22e98aa78ef4bcbfa006ae3b1ac23293ce6c15bb51e27c5cc85b9caca8

See more details on using hashes here.

File details

Details for the file vdata-0.3.6-py3-none-any.whl.

File metadata

  • Download URL: vdata-0.3.6-py3-none-any.whl
  • Upload date:
  • Size: 113.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Manjaro Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vdata-0.3.6-py3-none-any.whl
Algorithm Hash digest
SHA256 7342ae7ea2b945466f76a617e8318e15df3cc4ef99fd9f557ea37ca569ace37d
MD5 01826b2525934d56692efd749a04f6bc
BLAKE2b-256 8eb385a198d08c52fa883cce31b7e11dc353b6222aad557e31f7c2e35bb579b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page