Skip to main content

metaDMG: Estimating ancient damage in (meta)genomic DNA rapidly

Project description

metaDMG: Estimating ancient damage in (meta)genomic DNA rapidly


Work in progress. Please contact christianmichelsen@gmail.com for further information.


Installation:

conda env create --file environment.yaml

or, if you have mamba installed (faster)

mamba env create --file environment.yaml

or, by using pip:

pip install "metaDMG[all]"

or, with Poetry:

poetry add "metaDMG[all]"

Workflow:

Create config.yaml file:

$ metaDMG config ./raw_data/example.bam \
    --names raw_data/names.dmp.gz \
    --nodes raw_data/nodes.dmp.gz \
    --acc2tax raw_data/combined_taxid_accssionNO_20200425.gz

Run actual program:

$ metaDMG compute

See the results in the dashboard:

$ metaDMG dashboard

Usage:

metaDMG works by first creating a config file using the config command. This file contains all of the information related to metaDMG such that you only have to type this once. The config file is saved in the current directory as config.yaml and can subsequently be edited in any text editor of your like.

After the config has been created, we run the actual program using the compute command. This can take a while depending on the number (and size) of the files.

Finally the results are saved in {storage-dir}/results directory (data/results by default). These can be viewed with the interactive dashboard using the dashboard command.


config

CLI options:

metaDMG config takes a single argument, samples, and a bunch of additional options. The samples refer to a single or multiple alignment-files (or a directory containing them), all with the file extensions: .bam, .sam, and .sam.gz.

The options are listed below:

  • Input files:

    • --names: Path to the (NCBI) names.dmp.gz. Mandatory.
    • --nodes: Path to the (NCBI) nodes.dmp.gz. Mandatory.
    • --acc2tax: Path to the (NCBI) acc2tax.gz. Mandatory.
  • LCA parameters:

    • --simscorelow: Normalised edit distance (read to reference similarity) minimum. Number between 0-1. Default: 0.95.
    • --simscorehigh: Normalised edit distance (read to reference similarity) maximum. Number between 0-1 Default: 1.0.
    • --editdistmin: Minimum edit distance (read to reference similarity). Number between 0-10. Default: 0.
    • --editdistmax: Maximum edit distance (read to reference similarity). Number between 0-10. Default: 10.
    • --minmapq: Minimum mapping quality. Default: 0.
    • --max-position: Maximum position in the sequence to include. Default is (+/-) 15 (forward/reverse).
    • --weighttype: Method for calculating weights. Default is 1.
    • --fix-ncbi: Fix the (ncbi) database. Disable (0) if using a custom database. Default is 1.
    • --lca-rank: The LCA rank used in ngsLCA. Can be either family, genus, species or "" (everything). Default is "".
  • General parameters:

    • --storage-dir: Path where the generated output files and folders are stored. Default: ./data/.
    • --cores: The maximum number of cores to use. Default is 1.
    • --cores-pr-fit: Number of cores pr. fit. Do not change unless you know what you are doing.
    • --sample-prefix: Prefix for the sample names.
    • --sample-suffix: Suffix for the sample names.
    • --config-path: The name of the generated config file. Default: config.yaml.
  • Boolean flags (does not take any values, only the flag). Default is false.

    • --bayesian: Include a fully Bayesian model (probably better, but also a lot slower, about a factor of 100).
$ metaDMG config ./raw_data/example.bam \
    --names raw_data/names.dmp.gz \
    --nodes raw_data/nodes.dmp.gz \
    --acc2tax raw_data/combined_taxid_accssionNO_20200425.gz \
    --cores 4

metaDMG is pretty versatile regarding its input argument and also accepts multiple alignment files:

$ metaDMG config ./raw_data/*.bam [...]

or even an entire directory containing alignment files (.bam, .sam, and .sam.gz):

$ metaDMG config ./raw_data/ [...]

compute

The metaDMG compute command takes an optional config-file as argument (defaults to config.yaml if not specified).

Example:

$ metaDMG compute
$ metaDMG compute non-default-config.yaml

dashboard

The metaDMG dashboard command takes first an optional config-file as argument (defaults to config.yaml if not specified) followed by the following CLI options:

CLI options:

  • --port: The port to be used for the dashboard. Default is 8050.
  • --host: The dashboard host adress. Default is 0.0.0.0.
  • --debug: Boolean flag that allows for debugging the dashboard. Only for internal usage.

Example:

$ metaDMG dashboard
$ metaDMG dashboard non-default-config.yaml --port 8050 --host 0.0.0.0

Results

The column names in the results and their explanation:

  • General parameters:

    • tax_id: The tax ID. int64.
    • tax_name: The tax name. string.
    • tax_rank: The tax rank. string.
    • sample: The name of the original sample. string.
    • N_reads: The number of reads. int64.
    • N_alignments: The number of alignments. int64.
  • Fit related parameters:

    • lambda_LR: The likelihood ratio between the null model and the ancient damage model. This can be interpreted as the fit certainty, where higher values means higher certainty. float32.
    • lambda_LR_P: The likelihood ratio expressed as a probability. float32.
    • lambda_LR_z: The likelihood ratio expressed as number of . float32.
    • D_max: The estimated damage. This can be interpreted as the amount of damage in the specific taxa. float32.
    • q: The damage decay rate. float32.
    • A: The background independent damage. float32.
    • c: The background. float32.
    • phi: The concentration for a beta binomial distribution (parametrised by and ). float32.
    • rho_Ac: The correlation between A and c. High values of this are often a sign of a bad fit. float32.
    • valid: Wether or not the fit is valid (defined by iminuit). bool.
    • asymmetry: An estimate of the asymmetry of the forward and reverse fits. See below for more information. float32.
    • XXX_std: the uncertainty (standard deviation) of the variable XXX for D_max, A, q, c, and phi.
    • forward__XXX: The same description as above for variable XXX, but only for the forward read.
    • reverse__XXX: The same description as above for variable XXX, but only for the reverse read.
  • Read related parameters

    • mean_L: The mean read length of all the individual, unique reads that map to the specific taxa. float64.
    • std_L: The standard deviation of the above. float64.
    • mean_GC: The mean GC content of all the individual, unique reads that map to the specific taxa. float64.
    • std_GC: The standard deviation of the above. float64.
    • tax_path: The taxanomic path from the LCA to the root through the phylogenetic tree. string.
  • Count related paramters:

    • N_x=1_forward: The total number of "trials", , at position : in the forward direction. int64.
    • N_x=1_reverse: Same as above, but for the reverse direction. int64.
    • N_sum_forward: The sum of over all positions in the forward direction. int64.
    • N_sum_reverse: Same as above, but for the reverse direction. int64.
    • N_sum_total: The total sum N_sum_forward and N_sum_reverse. int64.
    • N_min: The minimum of for all positions (forward and reverse alike). int64.
    • k_sum_forward: The total number of "successes", , summed over all positions in the forward direction. int64.
    • k_sum_reverse: Same as above, but for the reverse direction. int64..
    • k_sum_total: The total sum k_sum_forward and k_sum_reverse. int64.
    • k+i: The number of "successes", at position : in the forward direction. int64.
    • k-i: Same as above, but for the reverse direction. int64.
    • N+i: The number of "trials", at position : in the forward direction. int64.
    • N-i: Same as above, but for the reverse direction. int64.
    • f+i: The fraction between and at position in the forward direction. int64.
    • f-i: Same as above, but for the reverse direction. int64.

convert

The metaDMG convert command takes first an optional config-file as argument (defaults to config.yaml if not specified) used to infer the results directory followed by the following CLI options:

CLI options:

  • --output: Mandatory output path.
  • --results: Direct path to the results directory.

Note that neither the config-file nor --results have to be specified (in which just the default config.yaml is used), however, both cannot be set at the same time.

Example:

$ metaDMG convert --output ./directory/to/contain/results.csv
$ metaDMG convert non-default-config.yaml --output ./directory/to/contain/results.csv

filter

The metaDMG filter command takes first an optional config-file as argument (defaults to config.yaml if not specified) used to infer the results directory followed by the following CLI options:

CLI options:

  • --output: Mandatory output path.
  • --query: The query string to use for filtering. Follows the Pandas Query() syntax. Default is "" which applies no filtering and is thus similar to the metaDMG convert command.
  • --results: Direct path to the results directory.

Note that neither the config-file nor --results have to be specified (in which just the default config.yaml is used), however, both cannot be set at the same time.

Example:

$ metaDMG filter --output convert-no-query.csv # similar to metaDMG convert
$ metaDMG filter --output convert-test.csv --query "N_reads > 5_000 & sample in ['subs', 'SPL_195_9299'] & tax_name == 'root'"


If you only want to install some of the tools, you can run:

pip install "metaDMG[fit]"

to only install the fitting part of the tool, or:

pip install "metaDMG[viz]"

to only install the interactive plotting tool (requires you to have gotten the results from somewhere else).



Updating metaDMG

With pip or Conda:

pip install "metaDMG[all]"  --upgrade

With Poetry:

poetry add metaDMG["all"]

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metaDMG-0.12.4.tar.gz (29.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

metaDMG-0.12.4-py3-none-any.whl (30.1 kB view details)

Uploaded Python 3

File details

Details for the file metaDMG-0.12.4.tar.gz.

File metadata

  • Download URL: metaDMG-0.12.4.tar.gz
  • Upload date:
  • Size: 29.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for metaDMG-0.12.4.tar.gz
Algorithm Hash digest
SHA256 f8c847333694233d9e0e8166239991f78bcdab1b222d1497d81ed8ca563f0098
MD5 65ec3aa93d03a2a4df681e5ccc68bf79
BLAKE2b-256 f971730b963f5e22dc4c53d7304d8c3e659742cac2ee7b82e5d2483dbc30ace0

See more details on using hashes here.

File details

Details for the file metaDMG-0.12.4-py3-none-any.whl.

File metadata

  • Download URL: metaDMG-0.12.4-py3-none-any.whl
  • Upload date:
  • Size: 30.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for metaDMG-0.12.4-py3-none-any.whl
Algorithm Hash digest
SHA256 37d42e6e29732964cf4ba7731b8a148742698e269f8a711457da9a20828c2959
MD5 b477d663c21dbe194bbf00d389665176
BLAKE2b-256 94ce664bad37a98093b7afd7cf3fb800cdd2af4349fa1bce0198165575f95e30

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page