V(D)J sequencing data analysis
Project description
pyVDJ
V(D)J sequencing data analysis
This package adds 10x Genomics V(D)J sequencing data to an AnnData object's .uns
part, and also makes annotation columns in .obs
.
This enables plotting various V(D)J properties and handling mRNA and V(D)J sequencing data together.
Install
pip3 install pyvdj
Usage
import pyvdj
adata = pyvdj.load_vdj(paths, samples, adata)
adata = pyvdj.vdj_add_obs(adata, obs=['is_clone'])
For a detailed description, see the tutorial.
Details
The package has 3 main functions that
- read
metrics_summary.csv
files into a pandas dataframe. - load
filtered_contig_annotations.csv
files into an AnnData object. - create annotations in the AnnData object.
Read metrics
The read10xsummary
function requires a list of paths to metrics_summary.csv
files, and optionally a dictionary of path:samplename. It returns a dataframe of the metrics.
Load V(D)J data
The load_vdj
function loads 10x V(D)J sequencing data (filtered_contig_annotations.csv
files) into an AnnData object's .uns['pyvdj']
slot, and returns the object. The adata.uns['pyvdj']
slot is a dictionary with the following elements:
'df'
: a dataframe containing V(D)J data'obs_col'
: theanndata.obs
columname of matching cellnames.'samples'
: a dictionary of filename:samplename
More entries will be added in the future. If an anndata object is not supplied, the function returns the dictionary.
Arguments:
paths
: list of paths to filtered_contig_annotations.csv files.samples
: a dictionary of path:samplename.adata
: the AnnData object.add_obs
: whether to add some default .obs metadata columns.
Add annotations
The adata.uns['pyvdj']['df']
is a pandas dataframe of the V(D)J data, with two additional columns that contain unique cell barcode and clonotype labels. These are generated using the user-supplied sample names: cellbarcode + '_' + samplename
and clonotype + '_' + samplename
.
These unique cell names are used to match the V(D)J cells to the AnnData .X
cells, using adata.obs['vdj_obs']
. The user has to prepare this column using the cell barcodes and the sample names.
The add_obs
function can add the following annotations:
'has_vdjdata'
: does the cell have V(D)J sequencing data?'clonotype'
: add clonotype name'is_clone'
: does it have a clone?'is_productive'
: are all chains productive?'chains'
: adds a boolean column for each chain'genes'
: adds a boolean column for each constant gene
More will be added in the future.
Dependencies
The package has been developed on data prepared with Cell Ranger v2.1.1 (Chemistry: Single Cell V(D)J; V(D)J Reference: GRCh38-alts-ensembl) and has been tested with the following Python package versions:
pandas 0.24.2
anndata 0.6.21
scanpy 1.4.3
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.