Identify and extract telomeric sequences from Oxford Nanopore or Illumina sequencing reads to extend Streptomycetes assemblies.
Project description
TELOMORE
Telomore is a tool for identifying and extracting telomeric sequences from Oxford Nanopore or Illumina sequencing reads of Streptomycetes spp. that have been excluded from a de novo assembly. It processes sequencing data to extend assemblies, generate quality control (QC) maps, and produce finalized assemblies with the telomere/recessed bases included.
Before running Telomore
Telomore does not identify linear contigs but rather rely on the user to provide that information in the header of the fasta-reference file.
Usage
telomore --mode <mode> --reference <reference.fasta> [options]
Required Arguments
--modeSpecify the sequencing platform. Options: nanopore or illumina.--referencePath to the reference genome file in FASTA format.
Nanopore-Specific Arguments
--singlePath to a single gzipped FASTQ file containing Nanopore reads.
Illumina-Specific Arguments
--read1Path to gzipped FASTQ file for Illumina read 1.--read2Path to gzipped FASTQ file for Illumina read 2.
Optional Arguments
--coverage_thresholdSet the threshold for coverage to stop trimming during consensus trimming (Default is coverage=5 for ONT reads and coverage=1 for Illumina reads).--quality_thresholdSet the Q-score required to count a read position in the coverage calculation during consensus trimming (Default is Q-score=10 for ONT reads and Q-score=30 for Illumina reads).--threadsNumber of threads to use (default: 1).--keepRetain intermediate files (default: False).--quietSuppress console logging.
Process overview
The process is as follows:
- Map Reads: Reads are mapped against all contigs in a reference using either minimap2 or Bowtie2.
- Extract Extending Reads Extending reads that are mapped to the ends of linear contigs are extracted.
- Build Consensus The terminal extending reads from each end is used to construct a consensus using either lamassemble or mafft + EMBOSS cons
- Align and Attach consensus The consensus for each end is aligned to the reference and used to extend it.
- Trim Extended Replicon In a final step, all terminally mapped reads are mapped to the new extended reference and used to trim away spurious sequence, based on read-support.
Outputs
At the end of a run Telomore produces the following outputs:
├── {fasta_basename}_{seqtype}_telomore
│ ├── {contig_name}_telomore_extended.fasta
│ ├── {contig_name}_telomore_ext_{seqtype}.log
│ ├── {contig_name}_telomore_QC.bam
│ ├── {contig_name}_telomore_QC.bam.bai
│ ├── {contig_name}_telomore_untrimmed.fasta
│ └── {fasta_basename}_telomore.fasta
└── telomore.log # log containing run information.
In the folder there is a number of files generated for each contig considered:
| File Name | Description |
|---|---|
{contig_name}_telomore_extended.fasta |
Original contig sequence + added terminal bases - trimmed bases |
{contig_name}_telomore_ext_{seqtype}.log |
Log contianing information about bases added, trimmed off and final result. |
{contig_name}_telomore_QC.bam |
BAM file containing terminal reads mapped to {contig_name}_telomore_extended.fasta. Useful for manual inspection of the extension |
{contig_name}_telomore_QC.bam.bai |
Index file for the corresponding BAM file. |
{contig_name}_telomore_untrimmed.fasta |
Original contig sequence + added terminal bases |
Additionally, there is a fasta-file collecting all tagged linear contigs as they
appear in {contig_name}_telomore_extended.fasta together with all non-linear
contigs in the order they appear in the original file.
Inspecting the {contig_name}_QC.bam-file in IGV (Integrative Genomics Viewer) can be informative in evaluating the extended contig.
Dependencies (CLI-tools)
- Bowtie2
- Emboss tools (cons specifically)
- Lamassemble
- LAST-DB
- Mafft
- Minimap2, version 2.25 or higher
- Samtools
These can be installed using the conda recipe in this repo:
conda env create -f environment.yml -y
This repo can then be downloaded using git clone, the conda enviroment activated and the tool installed
# Activate telomore conda env
conda activate telomore
# Clone telomore repo
git clone https://github.com/dalofa/telomore && cd telomore
# Install package
pip install -e '.[dev]'
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file telomore-0.4.1.tar.gz.
File metadata
- Download URL: telomore-0.4.1.tar.gz
- Upload date:
- Size: 38.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
63854489e810a1354af5ec0ecd533f22e13e6a46100e404693de21ec303686c2
|
|
| MD5 |
40e5a427e4b9d22e0c108d7c5c910838
|
|
| BLAKE2b-256 |
d112b0760621231fb22bf1b06dba407e3f5ba2118da26ede994f4100de99841f
|
File details
Details for the file telomore-0.4.1-py3-none-any.whl.
File metadata
- Download URL: telomore-0.4.1-py3-none-any.whl
- Upload date:
- Size: 40.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f6b6f93ebd49e1747ac4957b6b6b37067575994a6cb97ee49bfe8dd3d151fecd
|
|
| MD5 |
d7006be06bad7acc0d8b9c15c31751b9
|
|
| BLAKE2b-256 |
f745c2f787a5185f6da59fcdfbcbef83fb7c98a204dc31bfc4ef410111079dcd
|