Skip to main content

No project description provided

Project description

Python pakcage for genomic variant analysis

Pypi Releases Downloads

How to install?

pip install variant

How to use?

๐Ÿงฌ variant motif subcommand can fetch motif sequence around given site.

 Usage: variant motif [OPTIONS]

 Fetch genomic motif.

โ•ญโ”€ Options โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚    --input        -i  TEXT  Input position file.                          โ”‚
โ”‚    --output       -o  TEXT  Output annotation file.                       โ”‚
โ”‚ *  --fasta        -f  TEXT  reference fasta file. [required]              โ”‚
โ”‚    --npad         -n  TEXT  Number of padding base to call motif. If you  โ”‚
โ”‚                             want to set different left and right pads,    โ”‚
โ”‚                             use comma to separate them. (eg. 2,3)         โ”‚
โ”‚    --with-header  -H        With header line in input file.               โ”‚
โ”‚    --columns      -c  TEXT  Sets columns for site info.                   โ”‚
โ”‚                             (Chrom,Pos,Strand)                            โ”‚
โ”‚                             [default: 1,2,3]                              โ”‚
โ”‚    --to-upper     -u        Convert motif to upper case.                  โ”‚
โ”‚    --wrap-site    -w        Wrap motif site.                              โ”‚
โ”‚    --help         -h        Show this message and exit.                   โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

demo:

I would like to get the 2 bases before the given sites, and 3 bases after the given sites, meanwhile, wrap the give sites with bracket. Moreover, the strand information should be taken into account.

use -n 2,3 -w

๐Ÿงซ variant effect subcommand can infer the effect of a mutation

 Usage: variant effect [OPTIONS]

 Annotation genomic variant effect.

โ•ญโ”€ Options โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ --input                 -i  TEXT     Input position file.                 โ”‚
โ”‚ --output                -o  TEXT     Output annotation file               โ”‚
โ”‚ --reference             -r  TEXT     reference species                    โ”‚
โ”‚ --reference-gtf             TEXT     Customized reference gtf file.       โ”‚
โ”‚ --reference-transcript      TEXT     Customized reference transcript      โ”‚
โ”‚                                      fasta file.                          โ”‚
โ”‚ --reference-protein         TEXT     Customized reference protein fasta   โ”‚
โ”‚                                      file.                                โ”‚
โ”‚ --release               -e  INTEGER  ensembl release                      โ”‚
โ”‚ --strandness            -s           Use strand infomation or not?        โ”‚
โ”‚ --pU-mode               -u           Make rRNA, tRNA, snoRNA into top     โ”‚
โ”‚                                      priority.                            โ”‚
โ”‚ --npad                  -n  INTEGER  Number of padding base to call       โ”‚
โ”‚                                      motif.                               โ”‚
โ”‚ --all-effects           -a           Output all effects.                  โ”‚
โ”‚ --with-header           -H           With header line in input file.      โ”‚
โ”‚ --columns               -c  TEXT     Sets columns for site info.          โ”‚
โ”‚                                      (Chrom,Pos,Strand,Ref,Alt)           โ”‚
โ”‚                                      [default: 1,2,3,4,5]                 โ”‚
โ”‚ --help                  -h           Show this message and exit.          โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

demo:

Store the following table in file (sites.tsv).

Chrom Position Strand Ref Alt
chr1 230703034 - C T
chr12 69353439 + A T
chr14 23645352 + G T
chr2 215361150 - A T
chr2 84906537 + C T
chr22 39319077 - T A
chr22 39319095 - T A
chr22 39319098 - T A

Run command:

variant-effect -i sites.tsv -H -r human -e 108 -t RNA -H -c 1,2,3
  • -i specify the input file
  • -H means the file is with header line, and the first row will be skipped;
  • -r use the specific genome, default is human
  • -e specify the Ensembl release version
  • -c means only use some of the columns in the input file. default will use the first 5 columns.

You will have this output

Chrom Position Strand Ref Alt mut_type gene_type gene_name gene_pos transcript_name transcript_pos transcript_motif coding_pos codon_ref aa_pos aa_ref distance2splice
chr1 230703034 - C T ThreePrimeUTR protein_coding ENSG00000135744(AGT) 42543 ENST00000680041(AGT-208) 1753 TGTGTCACCCCCAGTCTCCCA None None None None 295
chr12 69353439 + A T ThreePrimeUTR protein_coding ENSG00000090382(LYZ) 5059 ENST00000261267(LYZ-201) 695 TAGAACTAATACTGGTGAAAA None None None None 286
chr14 23645352 + G T ThreePrimeUTR protein_coding ENSG00000100867(DHRS2) 15238 ENST00000344777(DHRS2-202) 1391 CTGCCATTCTGCCAGACTAGC None None None None 210
chr2 215361150 - A T ThreePrimeUTR protein_coding ENSG00000115414(FN1) 74924 ENST00000323926(FN1-201) 8012 GGCCCGCAATACTGTAGGAAC None None None None 476
chr2 84906537 + C T ThreePrimeUTR protein_coding ENSG00000034510(TMSB10) 882 ENST00000233143(TMSB10-201) 327 CCTGGGCACTCCGCGCCGATG None None None None 148
chr22 39319077 - T A Intronic protein_coding ENSG00000100316(RPL3) 1313 ENST00000216146(RPL3-201) None None None None None None None
chr22 39319095 - T A Intronic protein_coding ENSG00000100316(RPL3) 1295 ENST00000216146(RPL3-201) None None None None None None None
chr22 39319098 - T A Intronic protein_coding ENSG00000100316(RPL3) 1292 ENST00000216146(RPL3-201) None None None None None None None

๐Ÿงซ variant coordinate subcommand can mapping chrom name and positions between different reference coordinate

 Usage: variant coordinate [OPTIONS]

 Fetch genomic motif.

โ•ญโ”€ Options โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ --input              -i  TEXT  Input position file.                         โ”‚
โ”‚ --output             -o  TEXT  Output annotation file.                      โ”‚
โ”‚ --reference-mapping  -m  TEXT  Mapping file for chrom name, first column is โ”‚
โ”‚                                chrom in the input, second column is chrom   โ”‚
โ”‚                                in the reference db (sep by tab)             โ”‚
โ”‚ --buildin-mapping    -M  TEXT  Build-in mapping for chrom name: U2E (UCSC   โ”‚
โ”‚                                to Ensembl), E2U (Ensembl to UCSC)           โ”‚
โ”‚ --with-header        -H        With header line in input file.              โ”‚
โ”‚ --columns            -c  TEXT  Sets columns for site info. (Chrom)          โ”‚
โ”‚                                [default: 1]                                 โ”‚
โ”‚ --help               -h        Show this message and exit.                  โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

โณโณโณ more functions will be supported in the future

TODO:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

variant-0.0.94.tar.gz (16.1 kB view details)

Uploaded Source

Built Distributions

variant-0.0.94-cp312-cp312-manylinux_2_35_x86_64.whl (24.7 kB view details)

Uploaded CPython 3.12 manylinux: glibc 2.35+ x86-64

variant-0.0.94-cp310-cp310-macosx_14_0_x86_64.whl (16.8 kB view details)

Uploaded CPython 3.10 macOS 14.0+ x86-64

File details

Details for the file variant-0.0.94.tar.gz.

File metadata

  • Download URL: variant-0.0.94.tar.gz
  • Upload date:
  • Size: 16.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.10.15 Darwin/23.6.0

File hashes

Hashes for variant-0.0.94.tar.gz
Algorithm Hash digest
SHA256 d992b82d7b158896271e10209c39b4a112970ed70dd5952d7e48e76ae57f9ad3
MD5 3b816ad706995d88c45aa1971a7ddd77
BLAKE2b-256 b31bc7d5d1f495d4d511cc3faf7d25fdf593b1d097ecadf001cb1f5ae3563606

See more details on using hashes here.

File details

Details for the file variant-0.0.94-cp312-cp312-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for variant-0.0.94-cp312-cp312-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 0b940b3ce4cbb15786d295477d675afcd558fe1797a00679d18ed95eb62659ba
MD5 896a686173df2fca1c457a90676dcd02
BLAKE2b-256 7eee3991f29a15142dac3a989288dc588d787e6f21f9f508b8b9c707bc5d1263

See more details on using hashes here.

File details

Details for the file variant-0.0.94-cp310-cp310-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for variant-0.0.94-cp310-cp310-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 cea41121cd0155b2543ed813e96ddbbd77ca41d17df64dc26b545e138fa11fe8
MD5 30d3b57875505adf05e0d64a26cbc0a8
BLAKE2b-256 8f93fbe8e4e076d5c5fd0437721bd6a167c46055513710597d19f4c0475fc259

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page