Skip to main content

Package to examine de novo clustering

Project description

travis

Denovonear

This code assesses whether de novo single-nucleotide variants are closer together within the coding sequence of a gene than expected by chance. We use local-sequence based mutation rates to account for differential mutability of regions. The default rates are per-trinucleotide based see Nature Genetics 46:944–950, but you can use your own rates, or even longer sequence contexts, such as 5-mers or 7-mers.

Install

pip install denovonear

Usage

Analyse de novo mutations with the CLI tool:

denovonear cluster \
   --in data/example.grch38.dnms.txt \
   --gencode data/example.grch38.gtf \
   --fasta data/example.grch38.fa \
   --out output.txt

explanation of options:

  • --in: path to tab-separated table of de novo mutations. See example table below for columns, or example.grch38.dnms.txt in data folder.
  • --gencode: path to GENCODE annotations in GTF format for transcripts and exons e.g. example release. Can be gzipped, or uncompressed.
  • --fasta: path to genome fasta, matching genome build of gencode file

If the --gencode or --fasta options are skipped (e.g. denovonear cluster --in INFILE --out OUTFILE), gene annotations will be retrieved via an ensembl web service. For that, you might need to specify --genome-build grch38 to ensure the gene coordinates match your de novo mutation coordinates.

  • --rates PATHS_TO_RATES_FILES
  • --rates-format context OR genome
  • --cache-folder PATH_TO_CACHE_DIR
  • --genome-build "grch37" or "grch38" (default=grch37)

The rates option operates in two ways. The first (which requires --rates-format to be "context") is to pass in one path to a tab separated file with three columns: 'from', 'to', and 'mu_snp'. The 'from' column contains DNA sequence (where the length is an odd number) with the base to change at the central nucleotide. The 'to' column contains the sequence with the central base modified. The 'mu_snp' column contains the probability of the change (as per site per generation).

The second way to use the rates option is to pass in multiple paths to VCFs containing mutation rates for every genome position. This requires the --rates-format to be "genome". Currently the only supported rates files are ones from Roulette (https://www.biorxiv.org/content/10.1101/2022.08.20.504670v1), which can be found here: http://genetics.bwh.harvard.edu/downloads/Vova/Roulette/. This needs both the VCFs and their index files.

The cache folder defaults to making a folder named "cache" within the working directory. The genome build indicates which genome build the coordinates of the de novo variants are based on, and defaults to GRCh37.

Example de novo table

gene_name chr pos consequence snp_or_indel
OR4F5 chr1 69500 missense_variant DENOVO-SNP
OR4F5 chr1 69450 missense_variant DENOVO-SNP

Python usage

from denovonear.gencode import Gencode
from denovonear.cluster_test import cluster_de_novos
from denovonear.mutation_rates import load_mutation_rates

gencode = Gencode('./data/example.grch38.gtf', './data/example.grch38.fa')
symbol = 'OR4F5'
de_novos = {'missense': [69500, 69450, 69400], 'nonsense': []}
rates = load_mutation_rates()
p_values = cluster_de_novos(de_novos, gencode[symbol], rates, iterations=1000000)

Pull out site-specific rates by creating Transcript objects, then get the rates by consequence at each site

from denovonear.rate_limiter import RateLimiter
from denovonear.load_mutation_rates import load_mutation_rates
from denovonear.load_gene import construct_gene_object
from denovonear.site_specific_rates import SiteRates

# extract transcript coordinates and sequence from Ensembl
async with RateLimiter(per_second=15) as ensembl:
    transcript = await construct_gene_object(ensembl, 'ENST00000346085')

mut_rates = load_mutation_rates()
rates = SiteRates(transcript, mut_rates)

# rates are stored by consequence, but you can iterate through to find all
# possible sites in and around the CDS:
for cq in ['missense', 'nonsense', 'splice_lof', 'synonymous']:
    for site in rates[cq]:
        site['pos'] = transcript.get_position_on_chrom(site['pos'], site['offset'])

# or if you just want the summed rate
rates['missense'].get_summed_rate()

Identify transcripts containing de novo events

You can identify transcripts containing de novos events with the identify_transcripts.py script. This either identifies all transcripts for a gene with one or more de novo events, or identifies the minimal set of transcripts to contain all de novos (where transcripts are prioritised on the basis of number of de novo events, and length of coding sequence). Transcripts can be identified with:

    denovonear transcripts \
        --de-novos data/example_de_novos.txt \
        --out output.txt \
        --all-transcripts

Other options are:

  • --minimise-transcripts in place of --all-transcripts, to find the minimal set of transcripts
  • --genome-build "grch37" or "grch38" (default=grch37)

Gene or transcript based mutation rates

You can generate mutation rates for either the union of alternative transcripts for a gene, or for a specific Ensembl transcript ID with the construct_mutation_rates.py script. Lof and missense mutation rates can be generated with:

denovonear rates \
    --genes data/example_gene_ids.txt \
    --out output.txt

The tab-separated output file will contain one row per gene/transcript, with each line containing a transcript ID or gene symbol, a log10 transformed missense mutation rate, a log10 transformed nonsense mutation rate, and a log10 transformed synonymous mutation rate.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

denovonear-0.10.2.tar.gz (188.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

denovonear-0.10.2-cp312-cp312-win_amd64.whl (134.7 kB view details)

Uploaded CPython 3.12Windows x86-64

denovonear-0.10.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

denovonear-0.10.2-cp312-cp312-macosx_10_9_x86_64.whl (161.2 kB view details)

Uploaded CPython 3.12macOS 10.9+ x86-64

denovonear-0.10.2-cp311-cp311-win_amd64.whl (134.1 kB view details)

Uploaded CPython 3.11Windows x86-64

denovonear-0.10.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

denovonear-0.10.2-cp311-cp311-macosx_10_9_x86_64.whl (160.4 kB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

denovonear-0.10.2-cp310-cp310-win_amd64.whl (134.1 kB view details)

Uploaded CPython 3.10Windows x86-64

denovonear-0.10.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

denovonear-0.10.2-cp310-cp310-macosx_10_9_x86_64.whl (160.0 kB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

denovonear-0.10.2-cp39-cp39-win_amd64.whl (135.0 kB view details)

Uploaded CPython 3.9Windows x86-64

denovonear-0.10.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

denovonear-0.10.2-cp39-cp39-macosx_10_9_x86_64.whl (161.1 kB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

denovonear-0.10.2-cp38-cp38-win_amd64.whl (135.2 kB view details)

Uploaded CPython 3.8Windows x86-64

denovonear-0.10.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

denovonear-0.10.2-cp38-cp38-macosx_10_9_x86_64.whl (161.8 kB view details)

Uploaded CPython 3.8macOS 10.9+ x86-64

File details

Details for the file denovonear-0.10.2.tar.gz.

File metadata

  • Download URL: denovonear-0.10.2.tar.gz
  • Upload date:
  • Size: 188.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for denovonear-0.10.2.tar.gz
Algorithm Hash digest
SHA256 5eb1c8f6921dba19426a6e79e4aeb829a07beae958e7fd21d47a3dfa626e4964
MD5 f5ab16e1fecbec4e16a2953fbf0ebe85
BLAKE2b-256 855c56efff0dc3b50958c606b635f196a0ed9c5a32f50f7803771839c9ab9ebb

See more details on using hashes here.

File details

Details for the file denovonear-0.10.2-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for denovonear-0.10.2-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 d296fcce7b2bca24e6ef7d9ad6306a897aa994fade4b895ff32d798ea472729f
MD5 5e3ed39bacd9a24e4bae9d9de2c718de
BLAKE2b-256 29a83a7bacd477acf25713187e741d25cf8081b5ff3286b6465f23074e1e09af

See more details on using hashes here.

File details

Details for the file denovonear-0.10.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for denovonear-0.10.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 474ee83a5a0afbd691ca4a4cc9d070776b2ae043c71fda36f079664933ebc8ec
MD5 00cfe90edff6e81794945066c04a5493
BLAKE2b-256 9be930e96668d5d8564bb9965d7818b754f46d1d8bcf19f9f53ba67f7d8bdbd7

See more details on using hashes here.

File details

Details for the file denovonear-0.10.2-cp312-cp312-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for denovonear-0.10.2-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 c41690221c159a8d5e1f3a22f36adb8f9db6667dd893b6e78c966afc0869067c
MD5 a81b72361570829874c8124cb51233da
BLAKE2b-256 1be2885fcb91e2d8b71024393c340ebd2f8e7193b667220f8ef0b00eaf4a622d

See more details on using hashes here.

File details

Details for the file denovonear-0.10.2-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for denovonear-0.10.2-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 5c98df9503bf5354fc8b5f7991e8d3f44257a62e2666bc56583992577ecda4d4
MD5 cd8402d957ccb208ff64fd06ad4e662a
BLAKE2b-256 fcce3f671fcd03036fae758e883914e446079af5d814570a25b4075d3fa4933d

See more details on using hashes here.

File details

Details for the file denovonear-0.10.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for denovonear-0.10.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d216c145f4bf13ed0ff52a05bba440e79e44cc947d40871b6a0eaa4378305b6d
MD5 1711ffc52eebc143841057f42291c7c4
BLAKE2b-256 a0de7320bf54a9ce6b15c1f1b6a19e898666b479426060ca88c7c2beaee66e21

See more details on using hashes here.

File details

Details for the file denovonear-0.10.2-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for denovonear-0.10.2-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 7f69d6d3d4dd7f9d42626024d02a8913ce279c137a92f5773f39c014256c080f
MD5 2460466abd510569f115d13ccd8106d8
BLAKE2b-256 a6bd421ba3ce78953b230b5ff17579ddf714808f8d85c243734771d5d1e3e053

See more details on using hashes here.

File details

Details for the file denovonear-0.10.2-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for denovonear-0.10.2-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 e7427499f64c9ea6744845c9a95cc5704fe1b992195aa5b3386b7fe262a3d656
MD5 b3e3f7db230122510c5887cf1b2ac0e2
BLAKE2b-256 5777fcb224dcd2aeac0d1fbee001d4a890e94f559b207d6574eea147e6469223

See more details on using hashes here.

File details

Details for the file denovonear-0.10.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for denovonear-0.10.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1faba69ace2294481309a7ab4bcc02e019bc645a03e0b552725d9ae36ac69211
MD5 d788d5067a799bcefbfd1909ddf8e49b
BLAKE2b-256 e9519aa3a0d3cbc0fecb67c907039610a2d422eb99ee015035fb31b9a8c3bee4

See more details on using hashes here.

File details

Details for the file denovonear-0.10.2-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for denovonear-0.10.2-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 1afc1409eec4d270bb3843093bbb2134842cf59c58918ec79b2cb3460f7084a0
MD5 660156ebd5c715c0e62eef533358ba70
BLAKE2b-256 153d710b065768c01f8a4328ba4061f025504678ac4bb04c2c0efa54aba6cfc2

See more details on using hashes here.

File details

Details for the file denovonear-0.10.2-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: denovonear-0.10.2-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 135.0 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for denovonear-0.10.2-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 b62afc8082de9f373ca37ae51ce20a09c170381aac96dbd27afce86a2433b5ea
MD5 2b57de68b6cd594a2811745ca9296062
BLAKE2b-256 26721ca2f93cff58e3a0e1b76a8f6cf0e565e26ab8f750ee9858a45d0bb8574c

See more details on using hashes here.

File details

Details for the file denovonear-0.10.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for denovonear-0.10.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c579c8d1bd791e79774d5ea99bba7fe10b8e1ab2e06b61f3bfb9ea9781b1559e
MD5 bc165edcfdc44a18fd6b1740db91334e
BLAKE2b-256 d703a9d9e00f3ffab0ea463096313a51340035ceadb6efa54c183f01e50f0cc0

See more details on using hashes here.

File details

Details for the file denovonear-0.10.2-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for denovonear-0.10.2-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 8a20e913627c3f22be87784428da1c63661bc01b3e209edeae003705e440665f
MD5 0ab4b662c6faaf289ef1c18a478cc2f5
BLAKE2b-256 18efe295b224ef7f8a1ea894d4481051d02ab5e37188d56c6532a33e841ce814

See more details on using hashes here.

File details

Details for the file denovonear-0.10.2-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: denovonear-0.10.2-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 135.2 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for denovonear-0.10.2-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 7e2ef80075869289407f250767f7422c930a190ac1c4249d1c8b586706647a0f
MD5 1c82fcfac62178fdb6a60f1628140539
BLAKE2b-256 b2123658706c4bacba38d716f4b2f00ded1232a849b2db85eb93008b1330dd21

See more details on using hashes here.

File details

Details for the file denovonear-0.10.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for denovonear-0.10.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d6c3db45967eacfc4f203caf57f6525e5ce16a4c4e96693638f8f3ad5df1897c
MD5 50ab793fb567ec40412bcca17b4e723f
BLAKE2b-256 20366477ab0d60cb353ad376db1842a3df566458197b0b948492d63f53428a09

See more details on using hashes here.

File details

Details for the file denovonear-0.10.2-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for denovonear-0.10.2-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 b3639db27a29a6434893a1828dcc733144de3097692f63274331e5af50a4fb2f
MD5 b6ca8682ea91f92c18231b7ec44f53ed
BLAKE2b-256 f32715b987d0492b65b57e6240f9bc614d28877c4670ddcef995d411cbe4871b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page