Skip to main content

A tool designed to de novo find potential modification sites.

Project description

Hammerhead

PyPI License: GPL v2

Workflow

The Hammerhead was developed specifically to identify potential modification sites using Nanopore R10.4.1 simplex reads. It leverages the strand-specific error pattern observed in these reads to detect modifications.

The pipeline utilizes a self-defined metric called the difference index to quantify the discrepancy in observed accuracy between the forward and reverse strands at individual sites. This difference index serves as a measure of the potential modification probability. A higher value of the difference index indicates a higher likelihood of modification at the corresponding site.

Installation

To use this tool, you'll need to install additional tools or packages for read processing, including samtools and minimap2. The following command can help you install dependencies.

# test version for dependencies
# minimap2	2.17
# samtools	1.17
# bedtools	2.30.0

conda install -c bioconda -c conda-forge minimap2 samtools bedtools -y
conda install -c bioconda -c conda-forge minimap2==2.17 samtools==1.17 bedtools==2.30.0 -y

To install this tool, please use the following command.

pip install Hammerhead-View

Quick usage

Hammerhead can be run in two different strategies to detect methylation:

This first strategy is to select the sites with a difference index over the cutoff, the default is 0.35.

hammerhead --ref genome.fa --read input.fastq --cpu 4

The second strategy is to select the top N sites, based on the difference index sorted from the largest to the smallest, the default number is 2000.

hammerhead --ref genome.fa --read input.fastq --cpu 4 --method top

Example

Here, we provide demo datasets for testing the Hammerhead. The following commands can help to download them.

wget https://figshare.com/ndownloader/files/46437190 -O ecoli.fa
wget https://figshare.com/ndownloader/files/46437193 -O test.fastq.gz

Please run the following command to start data analysis!

hammerhead --ref ecoli.fa --read test.fastq.gz --min_depth 5 --min_depth_strand 3

Note: The arguments used in this command were for demonstration purposes only (the read coverage of data was too shallow) and may not reflect the optimal settings for your dataset. It is generally recommended to use the default arguments when you have sufficient read coverage, typically considered to be more than 50-fold coverage.

Tool showcase

To show the potential of Hammerhead to identify the modifications in the bacterium. Here, two datasets from E. coli were used to call methylation including whole-genome sequencing (WGS) and whole-genome amplification (WGA) R10.4.1 simplex reads. The dam and dcm genes were found in the genome of the used E. coli strain. These two genes are associated with the G6mATC and C5mCWGG methylation.

The distribution of difference index for sites in E. coli genome. The WGA reads were used as a negative control due to the lack of inherent methylation information. Based on the background noise of WGA reads, the sites with a difference index over 0.35 were regarded as potential modification sites.

The motif of CCWGG and GATC was enriched using the sequences near these potential modification sites (-10 bp to +10 bp).

Note: Two datasets are available at the here. Both datasets were basecalled using the modification aware model, which is available in the directory of modification_aware_basecalling_model.

To demonstrate the effectiveness of the polishing strategy based on the Hammerhead in correcting substitution error types (G2A and C2T) caused by DNA modifications in assemblies, we present the substitution rates of 15 assemblies. These assemblies were generated using 40-, 50-, and 60-fold random subsampling Acinetobacter pittii R10.4.1 reads. We compared the results obtained from different polishing approaches with the reference chromosome.

  • No polishing
  • Polishing potential modification sites with approximate 10-fold duplex reads
  • Polishing total assemblies with 50-fold next-generation sequencing (NGS) reads

Documentation

For more details about the usage of Hammerhead and results profiling, please refer to the documentation.

All rights reserved.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Hammerhead-View-0.2.0.tar.gz (15.4 kB view details)

Uploaded Source

Built Distribution

Hammerhead_View-0.2.0-py3-none-any.whl (14.4 kB view details)

Uploaded Python 3

File details

Details for the file Hammerhead-View-0.2.0.tar.gz.

File metadata

  • Download URL: Hammerhead-View-0.2.0.tar.gz
  • Upload date:
  • Size: 15.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for Hammerhead-View-0.2.0.tar.gz
Algorithm Hash digest
SHA256 4611a89195311a596c7a479c62195ad795dcedba2c02309e189192541fbac044
MD5 f586d5ede4f21adf19fb755efcc8c191
BLAKE2b-256 f851f82656e832be6dbb3010742e3703c63073174c9338e46fbb8dd27883312b

See more details on using hashes here.

File details

Details for the file Hammerhead_View-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for Hammerhead_View-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f1e4329fc5d4e07ed2e321158237c7c1982ae765cee3d014e4e5d9b8898deb46
MD5 5e48c78fb4d1bea5606ce24669e80cc7
BLAKE2b-256 174a6a47e8c89dd196f0324ff8d63d52da67d2a5c60f7dcdfc0b9986126cbaf0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page