A tool designed to de novo find potential modification sites.
Project description
Hammerhead
Workflow
The Hammerhead was developed specifically to identify potential modification sites using Nanopore R10.4.1 simplex reads. It leverages the strand-specific error pattern observed in these reads to detect modifications.
The pipeline utilizes a self-defined metric called the difference index to quantify the discrepancy in observed accuracy between the forward and reverse strands at individual sites. This difference index serves as a measure of the potential modification probability. A higher value of the difference index indicates a higher likelihood of modification at the corresponding site.
Installation
To use this tool, you'll need to install additional tools or packages for read processing, including samtools and minimap2. The following command can help you install dependencies.
# test version for dependencies
# minimap2 2.17
# samtools 1.17
# bedtools 2.30.0
conda install -c bioconda -c conda-forge minimap2 samtools bedtools -y
conda install -c bioconda -c conda-forge minimap2==2.17 samtools==1.17 bedtools==2.30.0 -y
To install this tool, please use the following command.
pip install Hammerhead-View
Quick usage
Hammerhead
can be run in two different strategies to detect methylation:
This first strategy is to select the sites with a difference index over the cutoff, the default is 0.35.
hammerhead --ref genome.fa --read input.fastq --cpu 4
The second strategy is to select the top N sites, based on the difference index sorted from the largest to the smallest, the default number is 2000.
hammerhead --ref genome.fa --read input.fastq --cpu 4 --method top
Example
Here, we provide demo datasets for testing the Hammerhead
. The following commands can help to download them.
wget https://figshare.com/ndownloader/files/46437190 -O ecoli.fa
wget https://figshare.com/ndownloader/files/46437193 -O test.fastq.gz
Please run the following command to start data analysis!
hammerhead --ref ecoli.fa --read test.fastq.gz --min_depth 5 --min_depth_strand 3
Note: The arguments used in this command were for demonstration purposes only (the read coverage of data was too shallow) and may not reflect the optimal settings for your dataset. It is generally recommended to use the default arguments when you have sufficient read coverage, typically considered to be more than 50-fold coverage.
Tool showcase
To show the potential of Hammerhead to identify the modifications in the bacterium. Here, two datasets from E. coli were used to call methylation including whole-genome sequencing (WGS) and whole-genome amplification (WGA) R10.4.1 simplex reads. The dam and dcm genes were found in the genome of the used E. coli strain. These two genes are associated with the G6mATC and C5mCWGG methylation.
The distribution of difference index for sites in E. coli genome. The WGA reads were used as a negative control due to the lack of inherent methylation information. Based on the background noise of WGA reads, the sites with a difference index over 0.35 were regarded as potential modification sites.
The motif of CCWGG and GATC was enriched using the sequences near these potential modification sites (-10 bp to +10 bp).
Note: Two datasets are available at the here. Both datasets were basecalled using the modification aware model, which is available in the directory of modification_aware_basecalling_model
.
To demonstrate the effectiveness of the polishing strategy based on the Hammerhead in correcting substitution error types (G2A
and C2T
) caused by DNA modifications in assemblies, we present the substitution rates of 15 assemblies. These assemblies were generated using 40-, 50-, and 60-fold random subsampling Acinetobacter pittii R10.4.1 reads. We compared the results obtained from different polishing approaches with the reference chromosome.
- No polishing
- Polishing potential modification sites with approximate 10-fold duplex reads
- Polishing total assemblies with 50-fold next-generation sequencing (NGS) reads
Documentation
For more details about the usage of Hammerhead and results profiling, please refer to the documentation.
All rights reserved.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for Hammerhead_View-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f1e4329fc5d4e07ed2e321158237c7c1982ae765cee3d014e4e5d9b8898deb46 |
|
MD5 | 5e48c78fb4d1bea5606ce24669e80cc7 |
|
BLAKE2b-256 | 174a6a47e8c89dd196f0324ff8d63d52da67d2a5c60f7dcdfc0b9986126cbaf0 |