Python library for diversity analysis of immune repertoire.
Project description
PyDAIR
PyDAIR is a Python package that aims to study immunoglobulin heavy (IGH) chain diversity based on repertoire-sequencing (Rep-Seq) data using high-throughput sequencing techonologies. PyDAIR identifies the germline variable (V), diversity (D), and joining (J) genes that used by each IGH sequence. BLAST is used for aligning sequences to a database of known germline VDJ genes to assign VDJ. PyDAIR supports all features as long as the two motifs that located at the end of V gene and the start of J gene are know. PyDAIR is available under the terms of the GNU license.
INSTALLTION
PyDAIR requires Python 2.7 together with NumPy, Pandas, matplotlib, and BioPython packages. Further, PyDAIR requires NCBI BLAST+ for aligning IGH sequence to germline databases. PyDAIR is avaliable on the PyPI repository, as well as can be installed like any other Python package using pip command.
pip install numpy --user
pip install pandas --user
pip install matplotlib --user
pip install biopython --user
pip install pydair --user
Installtion instructions for NCBI BLAST+ are available on NCBI website. User should follow the instruction to install NCBI BLAST+.
Usage
PyDAIR has two main commands that are pydair-parseseq and pydair-analysis.
Command |
Function |
---|---|
pydair-parseseq |
Identificat ion of V, D and J genes that used by each IGH sequence. |
pydair-analysis |
Aggregation of the frequencies of usage of V, D and J genes, as well as extraction of CDR-H3 sequences. |
pydair-parseseq identifies V, D, and J genes from IGH each sequence by aligning IGH sequence to germline (V, D, and J) database using NCBI BLAST+. It requires IGH sequences, germline sequences, BLAST databases of germiline sequences, and BLAST parameters. The sequences should be given by FASTA format.
pydair-parseseq -q input_igh_sequences.fa \
-v v.fa \
-d d.fa \
-j j.fa \
--v-blastdb blastdb_v \
--d-blastdb blastdb_d \
--j-blastdb blastdb_j \
-o output1
PyDAIR generates several files to save the intermediate results, such as BLAST results, region that cannot be aligned to V and J genes. The final result is saved into output1.pydair file. If there several samples, pydair-parseseq should be run several times for each sample.
The statistical summaries are calculated by pydair-analysis command.
pydair-analysis -i output1.pydair output2.pydair output3.pydair \
-n Fugu1 Fugu2 Fugu3 \
-o stats_result \
--contain_ambiguous_D
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for PyDAIR-0.1.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fccd2cc9a92067fb150022c2293200a886147e4d85f8929c54c73871888b099a |
|
MD5 | 7307b9451360cb98dc7c43c25a0b56d5 |
|
BLAKE2b-256 | f13700f3bfe996056d9a484e7f0d4a25bd09e02fdb44845a83b94efe8dfb8dc9 |