denovonear

Package to examine de novo clustering

These details have not been verified by PyPI

Project links

homepage

Project description

github

Denovonear

This code assesses whether de novo single-nucleotide variants are closer together within the coding sequence of a gene than expected by chance, or whether the amino acids for those SNVs are closer together in the protein structure than expected by chance. We use local-sequence based mutation rates to account for differential mutability of regions. The default rates are per-trinucleotide based see Nature Genetics 46:944–950, but you can use your own rates, or even longer sequence contexts, such as 5-mers or 7-mers.

Install

pip install denovonear

Usage

Analyse de novo mutation clustering in the coding sequence with the CLI tool:

denovonear cluster \
   --in data/example.grch38.dnms.txt \
   --gencode data/example.grch38.gtf \
   --fasta data/example.grch38.fa \
   --out output.txt

Or test clustering within protein structures:

denovonear cluster-structure \
   --in data/example.grch38.dnms.txt \
   --structures PATH_TO_STRUCTURES.tar \
   --gencode data/example.grch38.gtf \
   --fasta data/example.grch38.fa \
   --out output.structure.txt

explanation of options:

--in: path to tab-separated table of de novo mutations. See example table below for columns, or example.grch38.dnms.txt in data folder.
--gencode: path to GENCODE annotations in GTF format for transcripts and exons e.g. example release. Can be gzipped, or uncompressed.
--fasta: path to genome fasta, matching genome build of gencode file
--structures: path to tar file containing PDB structures for all protein coding genes. This has only been tested with AlphaFold human proteome tar files e.g. https://ftp.ebi.ac.uk/pub/databases/alphafold/latest/UP000005640_9606_HUMAN_v4.tar The code identifies the approriate structure pdb by first searching for uniprot IDs via ensembl (starting with the transcript ID). This only permits variants in the canonical transcript, and returns nan if a) no structure PDB is found, b) multiple structures exists for the uniprot IDs, c) multiple chains are in the structure file, d) the structure has missing residues or e) the number of residues in the structure file does not match what is expected from the CDS length. This uses the carbon atom coordinates for each residue to place the amino acid position, and computes Euclidean distances between these coordinates.

If the --gencode or --fasta options are skipped (e.g. denovonear cluster --in INFILE --out OUTFILE), gene annotations will be retrieved via an ensembl web service. For that, you might need to specify --genome-build grch38 to ensure the gene coordinates match your de novo mutation coordinates.

--rates PATHS_TO_RATES_FILES
--rates-format context OR genome
--cache-folder PATH_TO_CACHE_DIR
--genome-build "grch37" or "grch38" (default=grch37)

The rates option operates in two ways. The first (which requires --rates-format to be "context") is to pass in one path to a tab separated file with three columns: 'from', 'to', and 'mu_snp'. The 'from' column contains DNA sequence (where the length is an odd number) with the base to change at the central nucleotide. The 'to' column contains the sequence with the central base modified. The 'mu_snp' column contains the probability of the change (as per site per generation).

The second way to use the rates option is to pass in multiple paths to VCFs containing mutation rates for every genome position. This requires the --rates-format to be "genome". Currently the only supported rates files are ones from Roulette (https://www.biorxiv.org/content/10.1101/2022.08.20.504670v1), which can be found here: http://genetics.bwh.harvard.edu/downloads/Vova/Roulette/. This needs both the VCFs and their index files.

The cache folder defaults to making a folder named "cache" within the working directory. The genome build indicates which genome build the coordinates of the de novo variants are based on, and defaults to GRCh37.

Example de novo table

gene_name	chr	pos	consequence	snp_or_indel
OR4F5	chr1	69500	missense_variant	DENOVO-SNP
OR4F5	chr1	69450	missense_variant	DENOVO-SNP

Python usage

from denovonear.gencode import Gencode
from denovonear.cluster_test import cluster_de_novos
from denovonear.mutation_rates import load_mutation_rates

gencode = Gencode('./data/example.grch38.gtf', './data/example.grch38.fa')
symbol = 'OR4F5'
de_novos = {'missense': [69500, 69450, 69400], 'nonsense': []}
rates = load_mutation_rates()
p_values = cluster_de_novos(de_novos, gencode[symbol], rates, iterations=1000000)

Pull out site-specific rates by creating Transcript objects, then get the rates by consequence at each site

from denovonear.rate_limiter import RateLimiter
from denovonear.load_mutation_rates import load_mutation_rates
from denovonear.load_gene import construct_gene_object
from denovonear.site_specific_rates import SiteRates

# extract transcript coordinates and sequence from Ensembl
async with RateLimiter(per_second=15) as ensembl:
    transcript = await construct_gene_object(ensembl, 'ENST00000346085')

mut_rates = load_mutation_rates()
rates = SiteRates(transcript, mut_rates)

# rates are stored by consequence, but you can iterate through to find all
# possible sites in and around the CDS:
for cq in ['missense', 'nonsense', 'splice_lof', 'synonymous']:
    for site in rates[cq]:
        site['pos'] = transcript.get_position_on_chrom(site['pos'], site['offset'])

# or if you just want the summed rate
rates['missense'].get_summed_rate()

Identify transcripts containing de novo events

You can identify transcripts containing de novos events with the identify_transcripts.py script. This either identifies all transcripts for a gene with one or more de novo events, or identifies the minimal set of transcripts to contain all de novos (where transcripts are prioritised on the basis of number of de novo events, and length of coding sequence). Transcripts can be identified with:

    denovonear transcripts \
        --de-novos data/example_de_novos.txt \
        --out output.txt \
        --all-transcripts

Other options are:

--minimise-transcripts in place of --all-transcripts, to find the minimal set of transcripts
--genome-build "grch37" or "grch38" (default=grch37)

Gene or transcript based mutation rates

You can generate mutation rates for either the union of alternative transcripts for a gene, or for a specific Ensembl transcript ID with the construct_mutation_rates.py script. Lof and missense mutation rates can be generated with:

denovonear rates \
    --genes data/example_gene_ids.txt \
    --out output.txt

The tab-separated output file will contain one row per gene/transcript, with each line containing a transcript ID or gene symbol, a log10 transformed missense mutation rate, a log10 transformed nonsense mutation rate, and a log10 transformed synonymous mutation rate.

Project details

These details have not been verified by PyPI

Project links

homepage

Release history Release notifications | RSS feed

This version

0.13.0

Dec 10, 2025

0.12.0

Oct 17, 2025

0.11.3

Oct 23, 2024

0.11.2

Oct 14, 2024

0.11.1

Sep 25, 2024

0.11.0

Sep 19, 2024

0.11.0a0 pre-release

Sep 19, 2024

0.10.4

Mar 22, 2024

0.10.3

Mar 11, 2024

0.10.2

Nov 7, 2023

0.10.1

Apr 7, 2023

0.10.0

Mar 30, 2023

0.9.16

Mar 14, 2023

0.9.15

Mar 9, 2023

0.9.14

Jun 9, 2022

0.9.13 yanked

Jun 9, 2022

Reason this release was yanked:

missed submodule code

0.9.12

Feb 24, 2022

0.9.11

Feb 9, 2022

0.9.10

Dec 12, 2021

0.9.9

Nov 1, 2021

0.9.8

Oct 28, 2021

0.9.7

Oct 28, 2021

0.9.6

Oct 27, 2021

0.9.5

Oct 26, 2021

0.9.4

Sep 3, 2021

0.9.3

Sep 2, 2021

0.9.2

Sep 2, 2021

0.9.1

Sep 1, 2021

0.9.0

Aug 31, 2021

0.8.6

Mar 31, 2021

0.8.5

May 18, 2020

0.8.4

Mar 26, 2020

0.8.2

Sep 11, 2019

0.8.1

Sep 9, 2019

0.8.0

Sep 9, 2019

0.7.0

Apr 19, 2019

0.6.4

Sep 24, 2018

0.6.3

May 25, 2018

0.6.2

May 23, 2018

0.6.0

May 23, 2018

0.5.4

Feb 1, 2018

0.5.3

Jul 27, 2017

0.5.2

Jul 27, 2017

0.5.1

Jul 26, 2017

0.5.0

Jul 26, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

denovonear-0.13.0.tar.gz (242.2 kB view details)

Uploaded Dec 10, 2025 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

denovonear-0.13.0-cp314-cp314-win_amd64.whl (347.4 kB view details)

Uploaded Dec 10, 2025 CPython 3.14Windows x86-64

denovonear-0.13.0-cp314-cp314-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (1.5 MB view details)

Uploaded Dec 10, 2025 CPython 3.14manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

denovonear-0.13.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.5 MB view details)

Uploaded Dec 10, 2025 CPython 3.14manylinux: glibc 2.17+ x86-64

denovonear-0.13.0-cp314-cp314-macosx_11_0_arm64.whl (351.7 kB view details)

Uploaded Dec 10, 2025 CPython 3.14macOS 11.0+ ARM64

denovonear-0.13.0-cp313-cp313-win_amd64.whl (341.4 kB view details)

Uploaded Dec 10, 2025 CPython 3.13Windows x86-64

denovonear-0.13.0-cp313-cp313-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (1.5 MB view details)

Uploaded Dec 10, 2025 CPython 3.13manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

denovonear-0.13.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.5 MB view details)

Uploaded Dec 10, 2025 CPython 3.13manylinux: glibc 2.17+ x86-64

denovonear-0.13.0-cp313-cp313-macosx_11_0_arm64.whl (347.1 kB view details)

Uploaded Dec 10, 2025 CPython 3.13macOS 11.0+ ARM64

denovonear-0.13.0-cp312-cp312-win_amd64.whl (341.1 kB view details)

Uploaded Dec 10, 2025 CPython 3.12Windows x86-64

denovonear-0.13.0-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (1.5 MB view details)

Uploaded Dec 10, 2025 CPython 3.12manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

denovonear-0.13.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.5 MB view details)

Uploaded Dec 10, 2025 CPython 3.12manylinux: glibc 2.17+ x86-64

denovonear-0.13.0-cp312-cp312-macosx_11_0_arm64.whl (347.8 kB view details)

Uploaded Dec 10, 2025 CPython 3.12macOS 11.0+ ARM64

denovonear-0.13.0-cp311-cp311-win_amd64.whl (340.4 kB view details)

Uploaded Dec 10, 2025 CPython 3.11Windows x86-64

denovonear-0.13.0-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (1.5 MB view details)

Uploaded Dec 10, 2025 CPython 3.11manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

denovonear-0.13.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.5 MB view details)

Uploaded Dec 10, 2025 CPython 3.11manylinux: glibc 2.17+ x86-64

denovonear-0.13.0-cp311-cp311-macosx_11_0_arm64.whl (348.4 kB view details)

Uploaded Dec 10, 2025 CPython 3.11macOS 11.0+ ARM64

denovonear-0.13.0-cp310-cp310-win_amd64.whl (339.9 kB view details)

Uploaded Dec 10, 2025 CPython 3.10Windows x86-64

denovonear-0.13.0-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (1.5 MB view details)

Uploaded Dec 10, 2025 CPython 3.10manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

denovonear-0.13.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.5 MB view details)

Uploaded Dec 10, 2025 CPython 3.10manylinux: glibc 2.17+ x86-64

denovonear-0.13.0-cp310-cp310-macosx_11_0_arm64.whl (348.5 kB view details)

Uploaded Dec 10, 2025 CPython 3.10macOS 11.0+ ARM64

denovonear-0.13.0-cp39-cp39-win_amd64.whl (340.7 kB view details)

Uploaded Dec 10, 2025 CPython 3.9Windows x86-64

denovonear-0.13.0-cp39-cp39-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (1.5 MB view details)

Uploaded Dec 10, 2025 CPython 3.9manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

denovonear-0.13.0-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.5 MB view details)

Uploaded Dec 10, 2025 CPython 3.9manylinux: glibc 2.17+ x86-64

denovonear-0.13.0-cp39-cp39-macosx_11_0_arm64.whl (349.8 kB view details)

Uploaded Dec 10, 2025 CPython 3.9macOS 11.0+ ARM64

File details

Details for the file denovonear-0.13.0.tar.gz.

File metadata

Download URL: denovonear-0.13.0.tar.gz
Upload date: Dec 10, 2025
Size: 242.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for denovonear-0.13.0.tar.gz
Algorithm	Hash digest
SHA256	`5442fd0026c6ebc569dbe67e5b5550a7336f9b3ccbb597a6fc51a44e15da4e83`
MD5	`f1c03a0bb5a4b04825f2313365ccf5c1`
BLAKE2b-256	`78ba4f3c89abaa42aa62d2860a1820fc41d9c74f0eb17eb4bae368e9a270c554`

Algorithm	Hash digest
SHA256	`a26e6856da126ec3c2d91a56966659ad60ee4ebf7a116d37df4baf680b3dd352`
MD5	`196278af8aab4dfb97ede75b4979ae95`
BLAKE2b-256	`263b24ae17fa58da0d0f0c80ab31e240c9e8ad3cf70cac7a65e3fa4bdc29956e`

Algorithm	Hash digest
SHA256	`63858a49133a348c0bd07f77a4c4600219979185eac8f1d320ae6d62f8ce1add`
MD5	`bace92c3359fe985e571d4aa1a2c29be`
BLAKE2b-256	`9067e8ffb67d2078020fb17db4b059bb99e6eed8788a4bb0572e971bf134e417`

Algorithm	Hash digest
SHA256	`ec4c7a2253c8ae815047b9eac9d63c10f4e346401531f339ac96114222ee9ec2`
MD5	`9e2aa8e65d0409bc787f1a1b9bba3d71`
BLAKE2b-256	`2ff5c7086065e0a3d0938ba5dc1e6a3122ffd943b84baaf8e5965810bed8c212`

Algorithm	Hash digest
SHA256	`a807916289bda83283e1063a5acb5e5c7d35f6a2881fb928597dbc8807f67059`
MD5	`87ce6dedee981d3d5f7ff2b5d3dc6ebd`
BLAKE2b-256	`a7827e9f747e6ff03e6c1dfc96cf82bb1e57069dddff45a70e149b3adde6c0f7`

Algorithm	Hash digest
SHA256	`4ff60d4b9042666d97b473f721e1025de2f71d969f982b8405649f750f4fb7ff`
MD5	`d500d9e7db05b360ed5608dab1a22575`
BLAKE2b-256	`ee7f3ca00e1cd7fb966061bc9f80c990cf16f6e1ffcaf11b4278e4b81496ea2e`

Algorithm	Hash digest
SHA256	`232bd21c005f8ae0e9b9fa49ec03cc098382339b8f5174cad49624641aba4b18`
MD5	`3c42f1dc0b90f1287beb86716cc9599b`
BLAKE2b-256	`9f9715ed46f59602649e6d565f22b37cbfe663c558d7f1e992a2697e0a6b0f14`

Algorithm	Hash digest
SHA256	`4b963a6739aeaa26856f8a1fc69fc6865c1e4bc86e337fed4f1458fb7e467192`
MD5	`e4aa99210b858b119d447a4b5157582d`
BLAKE2b-256	`7b6b8f9e86aa8fa573104b3d00c0ea3765c8c1791c99d1944528f85bb4c674e5`

Algorithm	Hash digest
SHA256	`3b5521352ea0aa074a6f08fd02d6183f3599775459fbc32129efe2f114f7e6e6`
MD5	`6210fe1a1cd50e00ba56ac6b38f71fa6`
BLAKE2b-256	`b34ec697b71166e45756d77ea2518d11f58f8a0510d14dec1199bc09fb567e76`

Algorithm	Hash digest
SHA256	`dc22a8bfe225be94551da463414e7df6820e405486f6a9f4baec022a7c4f7a73`
MD5	`077fd2ed050973ef3ee6a82a8f09cc5e`
BLAKE2b-256	`e9a6d441eaef34501703090e2d3db1d43b4a30147916439aa0411a0ec0a0e649`

Algorithm	Hash digest
SHA256	`6ac18dd3fb7045d6dd997e3d0d816df33a1607135288c88a3b415208e788e9ab`
MD5	`912eca1624b920504a01a07818cc9023`
BLAKE2b-256	`2c1d7d032ffb200b1a43f4cc122d75fafd6e3d71d5eb2075da8d16c4ac7102c8`

Algorithm	Hash digest
SHA256	`823137d01ebc8ed9ddd1e473bc20a27969c6b23a72b1461b401d2e11a44b4835`
MD5	`6a1c7f302cef40bbdcdf941de876466e`
BLAKE2b-256	`4601aead35578cd4b74c2c552160644c45059a72558c3f388aec54c4bd6f32cc`

Algorithm	Hash digest
SHA256	`374963df57e19ce3d6b22abdb240ef9c40fe65c8b347f43de43ad68cf53baf60`
MD5	`f759d06a51c9d0be38147e942bf1d3fe`
BLAKE2b-256	`bd0cf68343b19735f7939d2d65e829bf5a1cfca2c44636817421652a92686abe`

Algorithm	Hash digest
SHA256	`33d35276357c7c4af38fff0b60047422df96166b896d46ce6b165c59f5d9ede4`
MD5	`d1ad8f3d9ee8f87c53f8b844370a8adf`
BLAKE2b-256	`46561fbc840834a72bf0f0a511ae947e0109226dd53de75037e46c481662b50f`

Algorithm	Hash digest
SHA256	`7191b3980b40208521923bf51d79d0eec8d1943d0acee6ae07f51ed15371ab74`
MD5	`6d6ee7ecf6d542788f4eb6153051b75a`
BLAKE2b-256	`f6fa00d4a52c3ca06009e9f4c23ee0ab8db9c38a7f155ac53297efe346e835b0`

Algorithm	Hash digest
SHA256	`e9ad158f6a3773e56096d2c6354adf1255057ba9937344c40a87a2154768dbc7`
MD5	`397f06d188c231c02e2f4d43fee8da5a`
BLAKE2b-256	`af1ba098ae97348e64abc7b7098108c64b097664fa0fbdf200460e56e5eb3e43`

Algorithm	Hash digest
SHA256	`90eb606b580c1e73e16f4558d8847f958bbbc9a6002a716d8c649d0c5193d411`
MD5	`a9c8464213e3e5408ac904eb74899a14`
BLAKE2b-256	`67c61be50994d44f021958a319453b458c66da40cc71d36e3d07f453f58e0b15`

Algorithm	Hash digest
SHA256	`4238bd8d1be3af0098d76c90467281a8a2fbd5cbaf384f64d4bcc18585745526`
MD5	`474d3b4a35db3d23cc6d36defb932b4d`
BLAKE2b-256	`462cbadded9dd88c307e77e29bb53cdf0bb462909d4859c7af84d22e6e3c0e5d`

Algorithm	Hash digest
SHA256	`8377b45d7bf5039970f919bc43d42dab9e07976abd440c510d3169c4e5f6dd40`
MD5	`18481b8ed32205e6c906b828386308f2`
BLAKE2b-256	`b693628440f1d05d16e9113f666486ba6260b3c3df8a88d3915e4eb2f9d3cd97`

Algorithm	Hash digest
SHA256	`72ebf621ef972305584a928ad16b952399848a189f4c42d06a65ce325bf404c6`
MD5	`e3241efc485e9b90fc7b7eca2d78b9b0`
BLAKE2b-256	`49485bc806694a52d131f11433af30412244e7eed564c3c17da87af1c845e4bf`

Algorithm	Hash digest
SHA256	`cd9a2e6e6b232b8363fbfee71ba9e3db9c06b8e81874eaf6e312103d5463efae`
MD5	`fef5380b1ae70307a2bf49c545dfcb20`
BLAKE2b-256	`75b68806985b607bd3d3a30d0248a89532088ccfaf026ae36f16d92565fab2dd`

denovonear 0.13.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Denovonear

Install

Usage

Example de novo table

Python usage

Identify transcripts containing de novo events

Gene or transcript based mutation rates

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

Algorithm	Hash digest
SHA256	`3914bfa16e5e7db4bd2a47b8d106c95d2d6ba90ae1e108a0b7f15455b84c5214`
MD5	`2ccd15cae8b3569b3c4fbbd096406781`
BLAKE2b-256	`7d903ec2c5d4f9a798f77377d0761e0921179aa11cab1cffea1aaff16e239332`

Algorithm	Hash digest
SHA256	`fc2a5bfb7998ad4959a8210e1cad29ce7da633d268e821f56612f1f0688d156a`
MD5	`8b472af9782ecf9e43f9b4a4c7e11b6c`
BLAKE2b-256	`582a71e405d529b3fbdb2a6f329487e9c78825334af55136fc803f6a88368c11`

Algorithm	Hash digest
SHA256	`9bc4c402528e86687d4ebfed1ddde9223d1eca601cc90a0e47e3d45e9fc55b2e`
MD5	`af02f853d1187e869d7882942da7ff24`
BLAKE2b-256	`2d898e1c56fdff85e5e875a09761d8e750708cd6f7e2f70cd04f8a9c1db97566`

Algorithm	Hash digest
SHA256	`d27b2bb1393bed5b0c66f0e4d561743b62b6a315ba27b03a5e15e1a765590024`
MD5	`0d8aee1cee5f49959d96f2c2cd3e4def`
BLAKE2b-256	`cc7a0ca6a332f59cc908b05310c02dba9890205c26fe7937efaef969e5bafde2`