Skip to main content

A bioinformatics pipeline for basecalling ONT NGS Data.

Project description

NanoGO Basecaller

NanoGo Logo

Oxford Nanopore Data Processing Made Simple

Build Status Coverage Python Versions PyPI Downloads License


Overview

NanoGO Basecaller is a specialized command-line tool designed for efficient processing of Oxford Nanopore Technologies (ONT) sequencing data. It integrates seamlessly with Dorado, ONT’s latest high-performance basecalling software, and supports both standard and duplex basecalling. Whether you are a seasoned bioinformatician or a newcomer to ONT data, NanoGO Basecaller offers:

  • Simple setup: Automatic installation of Dorado and other dependencies
  • High-performance basecalling: GPU acceleration and multi-threading support
  • Flexible workflows: Interactive and scripted command-line modes
  • Demultiplexing: On-the-fly barcode separation with intuitive output organization

Quick Start

  1. Set up a Conda environment (Recommended):

    conda create -n nanogo-basecaller "python=3.10" -y
    conda activate nanogo-basecaller
    
  2. Install NanoGO Basecaller:

    pip install nanogo-basecaller
    
  3. Install Dorado (if not already installed):

    nanogo install-dorado  # Automatically handles latest version
    

    Use --user to install locally without sudo, or --force to overwrite an existing installation.

  4. Run the basecaller:

    nanogo basecaller -i /path/to/FAST5_or_POD5 -o /path/to/output
    

    You will be guided through model selection, demultiplexing, and hardware preferences.

That’s all it takes to start basecalling ONT data with NanoGO!


Table of Contents


Key Features

  • Dorado Integration
    Automatically installs and manages Dorado (v0.6.0+) for standard or duplex basecalling. No manual setup required.

  • Fast and Scalable
    Utilizes GPU acceleration and automatic resource detection to speed up large datasets. Supports multi-GPU configurations.

  • Interactive or Scripted
    A single command-line interface supports either guided (interactive) runs or scripted executions for automation.

  • On-the-Fly Demultiplexing
    Built-in barcode detection to produce organized output directories by barcode. Unclassified reads are also tracked.

  • Robust Conversion and Management
    Automatically converts FAST5 to POD5 format. Handles model selection, model downloads, and version checks.

  • Error Handling
    Offers comprehensive logging, graceful recovery, and easy debugging through user-friendly error messages.


Detailed Installation

System Requirements

  • Python: 3.8 to 3.10 (3.11+ is currently unsupported)
  • Operating System: Linux or Windows Subsystem for Linux (WSL 2)
  • CPU: Minimum 4 cores recommended
  • RAM: 16GB+ recommended (32GB+ for larger datasets)
  • Storage: SSD recommended for high I/O workloads
  • GPU (Optional but Recommended):
    • NVIDIA GPU with CUDA support for best performance
    • Supports multiple GPUs for parallel jobs

Step-by-Step Installation

  1. Create a Conda environment:

    conda create -n nanogo-basecaller "python=3.10" -y
    conda activate nanogo-basecaller
    
  2. Install NanoGO Basecaller using pip:

    pip install nanogo-basecaller
    
  3. Verify the NanoGO CLI:

    nanogo --help
    

    This should display help text and list available subcommands.

Installing from Source

If you prefer to work with the latest source code:

  1. Clone the repository:
    git clone https://github.com/phac-nml/nanogo-basecaller.git
    cd nanogo-basecaller
    
  2. Install in development mode:
    pip install -e .
    

Installing Dorado

1. Using the Built-In Installer (Recommended)

nanogo install-dorado
  • Detects existing installations
  • Downloads and verifies the latest Dorado version
  • Installs to your virtual environment or system path

Use --user to install locally (no sudo) or --force to overwrite any existing installation.

2. Manual Installation

If automatic installation fails:

  1. Download Dorado from Oxford Nanopore’s CDN.
  2. Extract the tarball:
    tar -xzf dorado-x.y.z-linux-x64.tar.gz
    
  3. Copy binaries and libraries to a directory in your PATH (e.g., ~/.local/bin and ~/.local/lib).
  4. Add these directories to your system PATH or LD_LIBRARY_PATH as needed.

Usage

Interactive Mode

Run the basecaller without any arguments to enter interactive mode:

nanogo basecaller

You will be prompted to:

  1. Select or confirm the input directory
  2. Choose a Dorado basecalling model
  3. Specify demultiplexing options (if relevant)
  4. Set GPU or CPU usage (auto-detected by default)

Command-Line Mode

For direct or automated executions:

nanogo basecaller -i /path/to/reads -o /path/to/output [options]

Examples:

# Use GPU (auto-detected), standard basecalling
nanogo basecaller -i data/ -o results/

# Enable duplex basecalling
nanogo basecaller -i data/ -o results/ --duplex

Workflow

  1. Version Check – Verifies Dorado and POD5 versions.
  2. Input Scanning – Locates FAST5/POD5 files.
  3. Configuration – Selects basecalling model, sets up demultiplexing.
  4. Preparation – Converts FAST5 to POD5 if necessary; downloads models.
  5. Basecalling – Executes Dorado (standard/duplex) generating BAM or FASTQ files.
  6. Demultiplexing – Splits reads by barcode into subfolders.
  7. Output Structuring – Moves final outputs into well-defined directory tree.

Command-Line Options

Basecalling Options
  • -b, --basecaller – Enable specifying basecaller software (Dorado enabled by default).
  • -d, --duplex – Activates duplex basecalling mode.
  • -m, --model <model_name> – Manually specify a Dorado model.
  • --ignore <pattern> – Skip files matching the pattern (e.g., _failed.pod5).
Device Options
  • --device {auto,cpu,gpu} – Select processing device (default: auto-detect).
  • --gpu-device <ID> – Specify which GPU device to use (default: 0).
Advanced Options
  • --check-version – Check for the latest Dorado version (default: enabled).
  • --threads <N> – Specify number of CPU threads (default: auto-detect).
  • --chunk-size <SIZE> – Control chunking for basecalling.
  • --modified-bases – Enable modified base detection (requires a compatible model).

Input and Output Structure

Input Directory

NanoGO expects an organized directory with raw ONT data:

/path/to/reads
├─ Sample_A
│  ├─ A_01.fast5
│  └─ A_02.fast5
├─ Sample_B
│  ├─ B_01.pod5
│  ├─ B_02.fast5
└─ ...
  • Each subfolder is treated as a separate run or sample.
  • FAST5 or POD5 files are automatically detected.

Output Directory

NanoGO creates a structured output folder:

/path/to/output
├─ temp_data
│  ├─ basecalling_model/
│  ├─ sample_sheet.csv
│  └─ sublist_# folders/ (processing chunks)
└─ final_output
   ├─ barcode01/
   ├─ barcode02/
   └─ unclassified/
  • temp_data: Intermediate files, logs, partial BAM/FASTQ outputs
  • final_output: Fully demultiplexed and basecalled reads, separated by barcode

File Naming Convention

{flow_cell_id}_{run_id}_{model_hash}_{kit_hash}_{file_count}.fastq
  1. flow_cell_id – From ONT metadata
  2. run_id – First 8 characters of run identifier
  3. model_hash – Short hash of the Dorado model used
  4. kit_hash – Short hash identifying the barcoding kit
  5. file_count – Incremental count to avoid conflicts

This scheme ensures clarity, uniqueness, and traceability of all output files.


Troubleshooting

  1. Installation Issues

    • Compiler errors: conda install -c conda-forge gcc_linux-64 gxx_linux-64
    • Missing Dorado: Run nanogo install-dorado or manually install. Check your PATH.
  2. Runtime Errors

    • No CUDA device: Ensure NVIDIA drivers are installed, or use --device cpu.
    • Memory errors: Lower chunk size or increase system RAM.
    • PySAM wheel issues: Try pip install --only-binary=:all: pysam.
  3. Path or Permission Problems

    • Use --user or run with appropriate permissions (sudo) if installing system-wide.
    • Update your PATH and LD_LIBRARY_PATH if installing Dorado manually.

License

NanoGO Basecaller is distributed under the GNU General Public License v3.0. Refer to the GNU GPL v3.0 for the full terms and conditions.


Support and Contact

  • Primary Contact: Gurasis Osahan, National Microbiology Laboratory
  • Issue Tracking: Use the GitHub Issues page for bug reports or feature requests
  • Documentation: Additional references and usage examples are in the docs/ directory

Maintained by the National Microbiology Laboratory, Public Health Agency of Canada.
Ensuring public health through advanced genomics.


Thank you for using NanoGO Basecaller!
We continuously improve our tools to deliver efficient and robust ONT data processing. Feel free to reach out with any feedback or suggestions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nanogo_basecaller-0.1.7.tar.gz (82.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nanogo_basecaller-0.1.7-py3-none-any.whl (90.9 kB view details)

Uploaded Python 3

File details

Details for the file nanogo_basecaller-0.1.7.tar.gz.

File metadata

  • Download URL: nanogo_basecaller-0.1.7.tar.gz
  • Upload date:
  • Size: 82.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for nanogo_basecaller-0.1.7.tar.gz
Algorithm Hash digest
SHA256 688bd57e6540053c6b54ecf022ef0d871fbcf701d5ae191941f131db2fd2782a
MD5 1f44ae345d56aa8ed765ecefe88d1750
BLAKE2b-256 36d9d1ae0296c853e565dd5e7142343cf9c8268a320a866afc8bac375538db8c

See more details on using hashes here.

File details

Details for the file nanogo_basecaller-0.1.7-py3-none-any.whl.

File metadata

File hashes

Hashes for nanogo_basecaller-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 b1b1e71d7bbc6e6c10f5933c0db8efd81bfaf9985365573ddb527a673ca08aea
MD5 634e31a6903d0e42a4082453acb80609
BLAKE2b-256 b059a5feffa6e0de72d4e8b63e407d188e5db2e227fcc115847b71fcdb18790e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page