Skip to main content

Identify and extract telomeric sequences from Oxford Nanopore or Illumina sequencing reads to extend Streptomycetes assemblies.

Project description

TELOMORE

Telomore is a tool for identifying and extracting telomeric sequences from Oxford Nanopore or Illumina sequencing reads of Streptomycetes spp. that have been excluded from a de novo assembly. It processes sequencing data to extend assemblies, generate quality control (QC) maps, and produce finalized assemblies with the telomere/recessed bases included.

Before running Telomore

Telomore does not identify linear contigs but rather rely on the user to provide that information in the header of the fasta-reference file.

Usage

telomore --mode <mode> --reference <reference.fasta> [options]

Required Arguments

  • --mode Specify the sequencing platform. Options: nanopore or illumina.
  • --reference Path to the reference genome file in FASTA format.

Nanopore-Specific Arguments

  • --single Path to a single gzipped FASTQ file containing Nanopore reads.

Illumina-Specific Arguments

  • --read1 Path to gzipped FASTQ file for Illumina read 1.
  • --read2 Path to gzipped FASTQ file for Illumina read 2.

Optional Arguments

  • --coverage_threshold Set the threshold for coverage to stop trimming during consensus trimming (Default is coverage=5 for ONT reads and coverage=1 for Illumina reads).
  • --quality_threshold Set the Q-score required to count a read position in the coverage calculation during consensus trimming (Default is Q-score=10 for ONT reads and Q-score=30 for Illumina reads).
  • --threads Number of threads to use (default: 1).
  • --keep Retain intermediate files (default: False).
  • --quiet Suppress console logging.

Process overview

The process is as follows:

  1. Map Reads: Reads are mapped against all contigs in a reference using either minimap2 or Bowtie2.
  2. Extract Extending Reads Extending reads that are mapped to the ends of linear contigs are extracted.
  3. Build Consensus The terminal extending reads from each end is used to construct a consensus using either lamassemble or mafft + EMBOSS cons
  4. Align and Attach consensus The consensus for each end is aligned to the reference and used to extend it.
  5. Trim Extended Replicon In a final step, all terminally mapped reads are mapped to the new extended reference and used to trim away spurious sequence, based on read-support.

Outputs

At the end of a run Telomore produces the following outputs:

├── {fasta_basename}_{seqtype}_telomore
│   ├── {contig_name}_telomore_extended.fasta
│   ├── {contig_name}_telomore_ext_{seqtype}.log
│   ├── {contig_name}_telomore_QC.bam
│   ├── {contig_name}_telomore_QC.bam.bai
│   ├── {contig_name}_telomore_untrimmed.fasta
│   └── {fasta_basename}_telomore.fasta
└── telomore.log # log containing run information.

In the folder there is a number of files generated for each contig considered:

File Name Description
{contig_name}_telomore_extended.fasta Original contig sequence + added terminal bases - trimmed bases
{contig_name}_telomore_ext_{seqtype}.log Log contianing information about bases added, trimmed off and final result.
{contig_name}_telomore_QC.bam BAM file containing terminal reads mapped to {contig_name}_telomore_extended.fasta. Useful for manual inspection of the extension
{contig_name}_telomore_QC.bam.bai Index file for the corresponding BAM file.
{contig_name}_telomore_untrimmed.fasta Original contig sequence + added terminal bases

Additionally, there is a fasta-file collecting all tagged linear contigs as they appear in {contig_name}_telomore_extended.fasta together with all non-linear contigs in the order they appear in the original file.

Inspecting the {contig_name}_QC.bam-file in IGV (Integrative Genomics Viewer) can be informative in evaluating the extended contig.

Dependencies (CLI-tools)

  • Bowtie2
  • Emboss tools (cons specifically)
  • Lamassemble
  • LAST-DB
  • Mafft
  • Minimap2, version 2.25 or higher
  • Samtools

These can be installed using the conda recipe in this repo:

conda env create -f environment.yml -y

This repo can then be downloaded using git clone, the conda enviroment activated and the tool installed

# Activate telomore conda env
conda activate telomore

# Clone telomore repo
git clone https://github.com/dalofa/telomore && cd telomore

# Install package
pip install -e '.[dev]'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

telomore-0.4.1.tar.gz (38.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

telomore-0.4.1-py3-none-any.whl (40.8 kB view details)

Uploaded Python 3

File details

Details for the file telomore-0.4.1.tar.gz.

File metadata

  • Download URL: telomore-0.4.1.tar.gz
  • Upload date:
  • Size: 38.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for telomore-0.4.1.tar.gz
Algorithm Hash digest
SHA256 63854489e810a1354af5ec0ecd533f22e13e6a46100e404693de21ec303686c2
MD5 40e5a427e4b9d22e0c108d7c5c910838
BLAKE2b-256 d112b0760621231fb22bf1b06dba407e3f5ba2118da26ede994f4100de99841f

See more details on using hashes here.

File details

Details for the file telomore-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: telomore-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 40.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for telomore-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f6b6f93ebd49e1747ac4957b6b6b37067575994a6cb97ee49bfe8dd3d151fecd
MD5 d7006be06bad7acc0d8b9c15c31751b9
BLAKE2b-256 f745c2f787a5185f6da59fcdfbcbef83fb7c98a204dc31bfc4ef410111079dcd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page