Python package to manipulate and run IGoR data files
Project description
Pygor3
Pygor3 is a python3 framework to analyze, vizualize, generate and infer V(D)J recombination IGoR 's models. Pygor3 provide a python interface to execute and encapsulate IGoR’s input/outputs by using a sqlite3 database that contains input sequences, alignments, model parameters, conditional probabilities of the model Bayes network, best scenarios and generation probabilities in a single db file. Pygor3 also has command line utilities to import/export IGoR generated files to AIRR standard format.
Installation
-
First install IGoR in your sytem IGoR if you don't have it already. Pygor will use default IGoR's path to execute it.
-
(Optional) Install conda or anaconda and create (or use ) a virtual environment.
$ conda create --name statbiophys python=3.7 $ conda activate statbiophys
-
Use the package manager pip
(statbiophys) $ pip install pygor3
Command Line Usage
Quickstart
New Model
Now to create a model from scratch, donwload gene templates and anchors from IMGT website IMGT A list of available species to download from IMGT can be query with imgt-get-genomes command and option --info.
```bash
$ pygor imgt-get-genomes --info
--------------------------------
http://www.imgt.org
Downloading data from ...
List of IMGT available species:
Gallus+gallus
Cercocebus+atys
Mustela+putorius+furo
Macaca+nemestrina
Vicugna+pacos
Mus+cookii
Bos+taurus
Canis+lupus+familiaris
Ornithorhynchus+anatinus
Macaca+mulatta
Rattus+rattus
Mus+minutoides
Danio+rerio
Oncorhynchus+mykiss
Tursiops+truncatus
Felis+catus
Homo+sapiens
Salmo+salar
Macaca+fascicularis
Mus+musculus
Mus+saxicola
Capra+hircus
Sus+scrofa
Mus+pahari
Ovis+aries
Equus+caballus
Camelus+dromedarius
Oryctolagus+cuniculus
Papio+anubis+anubis
Mus+spretus
Rattus+norvegicus
For more details access:
http://www.imgt.org/download/GENE-DB/IMGTGENEDB-GeneList
```
-
Download genomic templates using VJ or VDJ corresponding to the type of chain.
$ pygor imgt-get-genomes -t VDJ --imgt-species Homo+sapiens --imgt-chain TRB
This creates a directory models with the following structure will be created
models/ └── Homo+sapiens └── TRB ├── models └── ref_genome ├── genomicDs.fasta ├── genomicDs__imgt.fasta ├── genomicDs__imgt.fasta_short ├── genomicJs.fasta ├── genomicJs__imgt.fasta ├── genomicJs__imgt.fasta_short ├── genomicJs__imgt.fasta_trim ├── genomicVs.fasta ├── genomicVs__imgt.fasta ├── genomicVs__imgt.fasta_short ├── genomicVs__imgt.fasta_trim ├── J_gene_CDR3_anchors.csv ├── J_gene_CDR3_anchors__imgt.csv ├── J_gene_CDR3_anchors__imgt.csv_short ├── V_gene_CDR3_anchors.csv ├── V_gene_CDR3_anchors__imgt.csv └── V_gene_CDR3_anchors__imgt.csv_short -
Create a new initial default model, with uniform distribution for the conditional probabilities of Bayes network ("model_marginals.txt" file). Notice that in IGoR this file is called marginals, but it is not the marginal probability of a recombination event.
$ pygor model-create -M models/Homo+sapiens/TRB/ -t VDJ -------------------------------- igortask.igor_model_dir_path: models/Homo+sapiens/TRB/ Writing model parms in file models/Homo+sapiens/TRB//models/model_parms.txt Writing model marginals in file models/Homo+sapiens/TRB//models/model_marginals.txt
A uniform model files will be created in files model_parms.txt and model_marginals.txt at directory path
models/ └── Homo+sapiens └── TRB ├── models │ ├── model_marginals.txt │ └── model_parms.txt └── ref_genome ├── genomicDs.fasta ├── genomicDs__imgt.fasta ├── genomicDs__imgt.fasta_short ├── genomicJs.fasta ├── genomicJs__imgt.fasta ├── genomicJs__imgt.fasta_short ├── genomicJs__imgt.fasta_trim ├── genomicVs.fasta ├── genomicVs__imgt.fasta ├── genomicVs__imgt.fasta_short ├── genomicVs__imgt.fasta_trim ├── J_gene_CDR3_anchors.csv ├── J_gene_CDR3_anchors__imgt.csv ├── J_gene_CDR3_anchors__imgt.csv_short ├── V_gene_CDR3_anchors.csv ├── V_gene_CDR3_anchors__imgt.csv └── V_gene_CDR3_anchors__imgt.csv_short
At this point you can use a set of non-productive sequence to infer a model within IGoR directly or by using pygor command
$ pygor igor-infer -M models/Homo+sapiens/TRB/ -i sample_realizations.csv -o new_hs_trb
This will output the following files
new_hs_hb.db new_hs_hb_BN.pdf new_hs_hb_RM.pdf new_hs_hb_marginals.txt new_hs_hb_parms.txt
where new_hs_trb.db is a database with the encapsulated information about the new model and the date used by IGoR to infer it, new_hs_hb_BN.pdf is a plot of the Bayesian network(BN) of inferred model, new_hs_hb_RM.pdf are plots of the real marginals of events in BN, and finally the new_hs_hb_parms.txt and new_hs_marginals.txt the inferred model in IGoR's format.
Model evaluation
With an inferred model we can evaluate the probability of a particular sequence to be generated (pgen) and get the most probable scenarios for the recombination of this sequence or generate synthetic sequences.
IGoR is delivered with some default models, this models can be loaded with IGoR by using options --species (-s) and --chain (-c)
$ pygor model-plot -s human -c beta -o defau
or
$ pygor model-plot -M models/Homo+sapiens/TRB/ -o new_model
or
$ pygor model-plot -D new_hs_hb.db -o new_model
This will output two pdf files with the Marginal Probabilities and Conditional probabilities of events
$pygor igor-evaluate -M -i input_sequence -o output
An tsv airr standard format is created with the rearragement.
sequence_id sequence rev_comp productive v_call d_call j_call sequence_alignment germline_alignment junction junction_aa v_cigar d_cigar j_cigar v_score v_identity v_support v_sequence_start v_sequence_end v_germline_start v_germline_end v_alignment_start v_alignment_end d_score d_identity d_support d_sequence_start d_sequence_end d_germline_start d_germline_end d_alignment_start d_alignment_end j_score j_identity j_support j_sequence_start j_sequence_end j_germline_start j_germline_end j_alignment_start j_alignment_end sequence_aa vj_in_frame stop_codon complete_vdj locus sequence_alignment_aa n1_length np1 np1_aa np1_length n2_length np2 np2_aa np2_length p3v_length p5d_length p3d_length p5j_length scenario_rank scenario_proba_cond_seq pgen quality quality_alignment
0 CAGTCTCCCAGGTACAAAGTCACAAAGAGGGGACAGGATGTAACTCTCAGGTGTGATCCAATTTCGAGTCATGCAACCCTTTATTGGTATCAACAGGCCCTGGGGCAGGGCCCAGAGTTTCTGACTTACTTCAATTATGAAGCTCAACCAGACAAATCAGGGCTGCCCAGTGATCGGTTCTCTGCAGAGAGGCCTGAGGGATCCATCTCCACTCTGACGATTCAGCGCACAGAGCAGCGGGACTCAGCCATGTATCGCTGTGCTAGCAGCATTCCTCGGGCTGTCAGATACGCAGTATTTTGGCCCAGGCACCCGGCTGACAGTGCTCG F TRBV7-7*01 TRBD2*02 TRBJ2-3*01 GGTGCTGGAGTCTCCCAGTCTCCCAGGTACAAAGTCACAAAGAGGGGACAGGATGTAACTCTCAGGTGTGATCCAATTTCGAGTCATGCAACCCTTTATTGGTATCAACAGGCCCTGGGGCAGGGCCCAGAGTTTCTGACTTACTTCAATTATGAAGCTCAACCAGACAAATCAGGGCTGCCCAGTGATCGGTTCTCTGCAGAGAGGCCTGAGGGATCCATCTCCACTCTGACGATTCAGCGCACAGAGCAGCGGGACTCAGCCATGTATCGCTGTGCCAGCAGCATTCCTCGGGCTGTCAGATACGCAGTATTTTGGCCCAGGCACCCGGCTGACAGTGCTCG TGTGCTAGCAGCATTCCTCGGGCTGTCAGATACGCAGTATTTT 285M 4M 45M 1425 2 285 16 283 20 290 292 10 13 225 7 50 6 50 6ATTCCT 6 4 CTGT 4 0 0 0 0 1 0.02729091.34834e-19
0 CAGTCTCCCAGGTACAAAGTCACAAAGAGGGGACAGGATGTAACTCTCAGGTGTGATCCAATTTCGAGTCATGCAACCCTTTATTGGTATCAACAGGCCCTGGGGCAGGGCCCAGAGTTTCTGACTTACTTCAATTATGAAGCTCAACCAGACAAATCAGGGCTGCCCAGTGATCGGTTCTCTGCAGAGAGGCCTGAGGGATCCATCTCCACTCTGACGATTCAGCGCACAGAGCAGCGGGACTCAGCCATGTATCGCTGTGCTAGCAGCATTCCTCGGGCTGTCAGATACGCAGTATTTTGGCCCAGGCACCCGGCTGACAGTGCTCG F TRBV7-7*01 TRBD2*01 TRBJ2-3*01 GGTGCTGGAGTCTCCCAGTCTCCCAGGTACAAAGTCACAAAGAGGGGACAGGATGTAACTCTCAGGTGTGATCCAATTTCGAGTCATGCAACCCTTTATTGGTATCAACAGGCCCTGGGGCAGGGCCCAGAGTTTCTGACTTACTTCAATTATGAAGCTCAACCAGACAAATCAGGGCTGCCCAGTGATCGGTTCTCTGCAGAGAGGCCTGAGGGATCCATCTCCACTCTGACGATTCAGCGCACAGAGCAGCGGGACTCAGCCATGTATCGCTGTGCCAGCAGCATTCCTCGGGCTGTCAGATACGCAGTATTTTGGCCCAGGCACCCGGCTGACAGTGCTCG TGTGCTAGCAGCATTCCTCGGGCTGTCAGATACGCAGTATTTT 285M 4M 45M 1425 2 285 16 283 20 290 292 10 13 225 7 50 6 50 6ATTCCT 6 4 CTGT 4 0 0 0 0 2 0.02729091.34834e-19
Documentation
All the command line interface commands can be used in a python environment, like jupyter notebook, by exporting the pygor3 package
import pygor3 as p3
mdl = p3.IgorModel(model_parms_file="model_parms.txt", model_marginals_file="model_marginals.txt")
For further details checkout the documentation and notebooks directory.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pygor3-0.0.3.tar.gz.
File metadata
- Download URL: pygor3-0.0.3.tar.gz
- Upload date:
- Size: 123.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2.post20201201 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.7.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
748df8615f3ebe8a21fe38d41913d54fe42ef657446c4ea01911f68740be4de6
|
|
| MD5 |
7131c2600634b2999349ecfaeaff4b0f
|
|
| BLAKE2b-256 |
d882a37d5d745093bc1e0f7ce17e3f87d3504b809e21501ccddee168de5c5e38
|
File details
Details for the file pygor3-0.0.3-py3-none-any.whl.
File metadata
- Download URL: pygor3-0.0.3-py3-none-any.whl
- Upload date:
- Size: 1.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2.post20201201 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.7.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eee09e2ef648414e40cab0390c461a47814299f7e7aad42c2e2777ee36d61f81
|
|
| MD5 |
93a627b5ccb62178aaa5181fd06701c8
|
|
| BLAKE2b-256 |
0a31ccd8f36c601ed71682b90befbc77ee65c806b73a20769155354e105152f9
|