Annotated multivariate observation of timestamped data
Project description
🗂 VData
VData is used for storing and manipulating multivariate observations of timestamped data.
It extends the AnnData object by adding the time dimension.
Example : The VData object allows to efficiently store information about cells (observations), whose gene
expression (variables) is measured over multiple time points. It is build around layers (.layers). Each layer
is a 3D matrix of : obs
x var
x time points
. Around those layers, DataFrames allow to describe variables and
time points, while custom TemporalDataFrames describe observations.
The uns dictionnary is used to store additional unstructure data.
More generally, VData objects can be used to store any timestamped datasets where annotation of observations and variables is required.
🌟 Features
- complete Python reimplementation based on h5py
- very fast loading of any dataset
- memory-efficient data manipulation (<1GB) even for datasets of hundreds of GB.
- explicit handling of timestamped data, especially suited for simulated single-cell datesets
- complete compatibility with the scverse ecosystem
👁 Overview
General
The vdata
library exposes the actual VData object alongside with the TemporalDataFrame object which extends
the common pandas.DataFrame
to a third time
axis.
VData objects can be created from in-RAM objects such as AnnData
, TemporalDataFrame
, pandas.DataFrame
or
mappings of <layer name>
:DataFrame
.
It is also possible to load data from a VData
or an AnnData
saved as a
hdf5 website file or in csv
format.
🔵 Note An important distinction with
AnnData
is that when a VData is backed on (read from) an hdf5 file, the whole object is only loaded on-demand and by small chunks of data. As a result, VData objects will always consume small amounts of RAM and will be very fast to read.
Layers and data annotation
The bulk of the data is stored in TemporalDataFrames
, themselves stacked up in the layers dictionnary. Data is
thus represented as observations
x variables
x time points
dataframes. Observation indices can either be unique
at each time point or strictly the same (e.g. to store simulated data where a single cell can be recorded multiple
times).
Three additional dataframes are used for annotating the observations (obs), variables (var) and timepoints (timepoints).
Multi-dimension annotation
There are two additional mappings for storing multi-dimensional annotations (i.e. that require more than one column to
be stored). These are the obsm
and varm
mappings, which respectively contain TemporalDataFrames and pandas
DataFrames.
🟢 Example You can store PCA or UMAP coordinates in obsm.
Pairwise annotation
The last two mappings (obsp
and varp
) contain pariwise annotations : data in square matrices of obs
x obs
or var
x var
.
🟢 Example You can store distance values between observations in obsp.
📀 Installation
VData requires Python 3.9+
pip installation (stable)
pip install vdata
using git (latest)
git clone git@github.com:Vidium/vdata.git
📑 Documentation
See the complete documentation at [INCOMING].
Read the VData article at https://www.biorxiv.org/content/10.1101/2023.08.29.555297
🖋 Citation
You can cite the VData pre-print as :
VData: Temporally annotated data manipulation and storage
Matteo Bouvier, Arnaud Bonnaffoux
bioRxiv 2023.08.29.555297; doi: https://doi.org/10.1101/2023.08.29.555297
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file vdata-0.2.3.tar.gz
.
File metadata
- Download URL: vdata-0.2.3.tar.gz
- Upload date:
- Size: 78.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.10.12 Linux/5.15.0-116-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a9a2cb1961c3762d461b2af4a2d0fa010b503e43b3fe979f1b85b7a9999fbee1 |
|
MD5 | 06457929b62e029389d1aba2304953d5 |
|
BLAKE2b-256 | d32d406a8c5f4f72a3d78d9a0f1db49adb645e0bdd704bf3dce98503f1664d1c |
File details
Details for the file vdata-0.2.3-py3-none-any.whl
.
File metadata
- Download URL: vdata-0.2.3-py3-none-any.whl
- Upload date:
- Size: 104.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.10.12 Linux/5.15.0-116-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1256a118dcc8521ac0ddaadc06f756481f1143198277e81aa7226de441029080 |
|
MD5 | eae97c93654a9c04ecdb5f9321b68c2c |
|
BLAKE2b-256 | 6559c36d68f46680910a0869b873b23739018dc9700befb8f59d8c2835c20f9a |