No project description provided
Project description
Python pakcage for genomic variant analysis
How to install?
pip install variant
How to use?
๐งฌ variant motif
subcommand can fetch motif sequence around given site.
Usage: variant motif [OPTIONS]
Fetch genomic motif.
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --input -i TEXT Input position file. โ
โ --output -o TEXT Output annotation file. โ
โ * --fasta -f TEXT reference fasta file. [required] โ
โ --npad -n TEXT Number of padding base to call motif. If you โ
โ want to set different left and right pads, โ
โ use comma to separate them. (eg. 2,3) โ
โ --with-header -H With header line in input file. โ
โ --columns -c TEXT Sets columns for site info. โ
โ (Chrom,Pos,Strand) โ
โ [default: 1,2,3] โ
โ --to-upper -u Convert motif to upper case. โ
โ --wrap-site -w Wrap motif site. โ
โ --help -h Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
demo:
I would like to get the 2 bases before the given sites, and 3 bases after the given sites, meanwhile, wrap the give sites with bracket. Moreover, the strand information should be taken into account.
use -n 2,3 -w
๐งซ variant effect
subcommand can infer the effect of a mutation
Usage: variant effect [OPTIONS]
Annotation genomic variant effect.
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --input -i TEXT Input position file. โ
โ --output -o TEXT Output annotation file โ
โ --reference -r TEXT reference species โ
โ --reference-gtf TEXT Customized reference gtf file. โ
โ --reference-transcript TEXT Customized reference transcript โ
โ fasta file. โ
โ --reference-protein TEXT Customized reference protein fasta โ
โ file. โ
โ --release -e INTEGER ensembl release โ
โ --strandness -s Use strand infomation or not? โ
โ --pU-mode -u Make rRNA, tRNA, snoRNA into top โ
โ priority. โ
โ --npad -n INTEGER Number of padding base to call โ
โ motif. โ
โ --all-effects -a Output all effects. โ
โ --with-header -H With header line in input file. โ
โ --columns -c TEXT Sets columns for site info. โ
โ (Chrom,Pos,Strand,Ref,Alt) โ
โ [default: 1,2,3,4,5] โ
โ --help -h Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
demo:
Store the following table in file (sites.tsv
).
Chrom | Position | Strand | Ref | Alt |
---|---|---|---|---|
chr1 | 230703034 | - | C | T |
chr12 | 69353439 | + | A | T |
chr14 | 23645352 | + | G | T |
chr2 | 215361150 | - | A | T |
chr2 | 84906537 | + | C | T |
chr22 | 39319077 | - | T | A |
chr22 | 39319095 | - | T | A |
chr22 | 39319098 | - | T | A |
Run command:
variant-effect -i sites.tsv -H -r human -e 108 -t RNA -H -c 1,2,3
-i
specify the input file-H
means the file is with header line, and the first row will be skipped;-r
use the specific genome, default is human-e
specify the Ensembl release version-c
means only use some of the columns in the input file. default will use the first 5 columns.
You will have this output
Chrom | Position | Strand | Ref | Alt | mut_type | gene_type | gene_name | gene_pos | transcript_name | transcript_pos | transcript_motif | coding_pos | codon_ref | aa_pos | aa_ref | distance2splice |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
chr1 | 230703034 | - | C | T | ThreePrimeUTR | protein_coding | ENSG00000135744(AGT) | 42543 | ENST00000680041(AGT-208) | 1753 | TGTGTCACCCCCAGTCTCCCA | None | None | None | None | 295 |
chr12 | 69353439 | + | A | T | ThreePrimeUTR | protein_coding | ENSG00000090382(LYZ) | 5059 | ENST00000261267(LYZ-201) | 695 | TAGAACTAATACTGGTGAAAA | None | None | None | None | 286 |
chr14 | 23645352 | + | G | T | ThreePrimeUTR | protein_coding | ENSG00000100867(DHRS2) | 15238 | ENST00000344777(DHRS2-202) | 1391 | CTGCCATTCTGCCAGACTAGC | None | None | None | None | 210 |
chr2 | 215361150 | - | A | T | ThreePrimeUTR | protein_coding | ENSG00000115414(FN1) | 74924 | ENST00000323926(FN1-201) | 8012 | GGCCCGCAATACTGTAGGAAC | None | None | None | None | 476 |
chr2 | 84906537 | + | C | T | ThreePrimeUTR | protein_coding | ENSG00000034510(TMSB10) | 882 | ENST00000233143(TMSB10-201) | 327 | CCTGGGCACTCCGCGCCGATG | None | None | None | None | 148 |
chr22 | 39319077 | - | T | A | Intronic | protein_coding | ENSG00000100316(RPL3) | 1313 | ENST00000216146(RPL3-201) | None | None | None | None | None | None | None |
chr22 | 39319095 | - | T | A | Intronic | protein_coding | ENSG00000100316(RPL3) | 1295 | ENST00000216146(RPL3-201) | None | None | None | None | None | None | None |
chr22 | 39319098 | - | T | A | Intronic | protein_coding | ENSG00000100316(RPL3) | 1292 | ENST00000216146(RPL3-201) | None | None | None | None | None | None | None |
๐งซ variant coordinate
subcommand can mapping chrom name and positions between different reference coordinate
Usage: variant coordinate [OPTIONS]
Fetch genomic motif.
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --input -i TEXT Input position file. โ
โ --output -o TEXT Output annotation file. โ
โ --reference-mapping -m TEXT Mapping file for chrom name, first column is โ
โ chrom in the input, second column is chrom โ
โ in the reference db (sep by tab) โ
โ --buildin-mapping -M TEXT Build-in mapping for chrom name: U2E (UCSC โ
โ to Ensembl), E2U (Ensembl to UCSC) โ
โ --with-header -H With header line in input file. โ
โ --columns -c TEXT Sets columns for site info. (Chrom) โ
โ [default: 1] โ
โ --help -h Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โณโณโณ more functions will be supported in the future
TODO:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
variant-0.0.93.tar.gz
(16.1 kB
view details)
Built Distribution
File details
Details for the file variant-0.0.93.tar.gz
.
File metadata
- Download URL: variant-0.0.93.tar.gz
- Upload date:
- Size: 16.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.10.14 Darwin/23.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 078c3b9de57ba7a1430fa95b40779b7f74932dd41f6efc6b9db7b7cba5760fa2 |
|
MD5 | 50b00fc5b58518834d129c7c9861ce70 |
|
BLAKE2b-256 | 968b4caccc94f9fd7c5eb97824df7ade5780711fb724757bb2c37dc4e3df8b36 |
File details
Details for the file variant-0.0.93-cp310-cp310-macosx_14_0_x86_64.whl
.
File metadata
- Download URL: variant-0.0.93-cp310-cp310-macosx_14_0_x86_64.whl
- Upload date:
- Size: 16.8 kB
- Tags: CPython 3.10, macOS 14.0+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.10.14 Darwin/23.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8504b7bd40ae0d0eedc0f5e9c7ab75f2231a126a6199a9a7b23ee77c98846cd5 |
|
MD5 | 9365dcb47340d6a5d660f89679de0494 |
|
BLAKE2b-256 | 9797fd7026590de7dc1800acc5beaf43644aee22bbe3a5d688e016bbee37e859 |