A bioinformatics pipeline for basecalling ONT NGS Data.
Project description
NanoGO Basecaller
Oxford Nanopore Data Processing Made Simple
Overview
NanoGO Basecaller is a specialized command-line tool designed for efficient processing of Oxford Nanopore Technologies (ONT) sequencing data. It integrates seamlessly with Dorado, ONT’s latest high-performance basecalling software, and supports both standard and duplex basecalling. Whether you are a seasoned bioinformatician or a newcomer to ONT data, NanoGO Basecaller offers:
- Simple setup: Automatic installation of Dorado and other dependencies
- High-performance basecalling: GPU acceleration and multi-threading support
- Flexible workflows: Interactive and scripted command-line modes
- Demultiplexing: On-the-fly barcode separation with intuitive output organization
Quick Start
-
Set up a Conda environment (Recommended):
conda create -n nanogo-basecaller "python=3.10" -y conda activate nanogo-basecaller
-
Install NanoGO Basecaller:
pip install nanogo-basecaller
-
Install Dorado (if not already installed):
nanogo install-dorado # Automatically handles latest version
Use
--userto install locally without sudo, or--forceto overwrite an existing installation. -
Run the basecaller:
nanogo basecaller -i /path/to/FAST5_or_POD5 -o /path/to/output
You will be guided through model selection, demultiplexing, and hardware preferences.
That’s all it takes to start basecalling ONT data with NanoGO!
Table of Contents
- NanoGO Basecaller
Key Features
-
Dorado Integration
Automatically installs and manages Dorado (v0.6.0+) for standard or duplex basecalling. No manual setup required. -
Fast and Scalable
Utilizes GPU acceleration and automatic resource detection to speed up large datasets. Supports multi-GPU configurations. -
Interactive or Scripted
A single command-line interface supports either guided (interactive) runs or scripted executions for automation. -
On-the-Fly Demultiplexing
Built-in barcode detection to produce organized output directories by barcode. Unclassified reads are also tracked. -
Robust Conversion and Management
Automatically converts FAST5 to POD5 format. Handles model selection, model downloads, and version checks. -
Error Handling
Offers comprehensive logging, graceful recovery, and easy debugging through user-friendly error messages.
Detailed Installation
System Requirements
- Python: 3.8 to 3.10 (3.11+ is currently unsupported)
- Operating System: Linux or Windows Subsystem for Linux (WSL 2)
- CPU: Minimum 4 cores recommended
- RAM: 16GB+ recommended (32GB+ for larger datasets)
- Storage: SSD recommended for high I/O workloads
- GPU (Optional but Recommended):
- NVIDIA GPU with CUDA support for best performance
- Supports multiple GPUs for parallel jobs
Step-by-Step Installation
-
Create a Conda environment:
conda create -n nanogo-basecaller "python=3.10" -y conda activate nanogo-basecaller
-
Install NanoGO Basecaller using pip:
pip install nanogo-basecaller
-
Verify the NanoGO CLI:
nanogo --helpThis should display help text and list available subcommands.
Installing from Source
If you prefer to work with the latest source code:
- Clone the repository:
git clone https://github.com/phac-nml/nanogo-basecaller.git cd nanogo-basecaller
- Install in development mode:
pip install -e .
Installing Dorado
1. Using the Built-In Installer (Recommended)
nanogo install-dorado
- Detects existing installations
- Downloads and verifies the latest Dorado version
- Installs to your virtual environment or system path
Use --user to install locally (no sudo) or --force to overwrite any existing installation.
2. Manual Installation
If automatic installation fails:
- Download Dorado from Oxford Nanopore’s CDN.
- Extract the tarball:
tar -xzf dorado-x.y.z-linux-x64.tar.gz
- Copy binaries and libraries to a directory in your PATH (e.g.,
~/.local/binand~/.local/lib). - Add these directories to your system PATH or LD_LIBRARY_PATH as needed.
Usage
Interactive Mode
Run the basecaller without any arguments to enter interactive mode:
nanogo basecaller
You will be prompted to:
- Select or confirm the input directory
- Choose a Dorado basecalling model
- Specify demultiplexing options (if relevant)
- Set GPU or CPU usage (auto-detected by default)
Command-Line Mode
For direct or automated executions:
nanogo basecaller -i /path/to/reads -o /path/to/output [options]
Examples:
# Use GPU (auto-detected), standard basecalling
nanogo basecaller -i data/ -o results/
# Enable duplex basecalling
nanogo basecaller -i data/ -o results/ --duplex
Workflow
- Version Check – Verifies Dorado and POD5 versions.
- Input Scanning – Locates FAST5/POD5 files.
- Configuration – Selects basecalling model, sets up demultiplexing.
- Preparation – Converts FAST5 to POD5 if necessary; downloads models.
- Basecalling – Executes Dorado (standard/duplex) generating BAM or FASTQ files.
- Demultiplexing – Splits reads by barcode into subfolders.
- Output Structuring – Moves final outputs into well-defined directory tree.
Command-Line Options
Basecalling Options
-b, --basecaller– Enable specifying basecaller software (Dorado enabled by default).-d, --duplex– Activates duplex basecalling mode.-m, --model <model_name>– Manually specify a Dorado model.--ignore <pattern>– Skip files matching the pattern (e.g.,_failed.pod5).
Device Options
--device {auto,cpu,gpu}– Select processing device (default: auto-detect).--gpu-device <ID>– Specify which GPU device to use (default: 0).
Advanced Options
--check-version– Check for the latest Dorado version (default: enabled).--threads <N>– Specify number of CPU threads (default: auto-detect).--chunk-size <SIZE>– Control chunking for basecalling.--modified-bases– Enable modified base detection (requires a compatible model).
Input and Output Structure
Input Directory
NanoGO expects an organized directory with raw ONT data:
/path/to/reads
├─ Sample_A
│ ├─ A_01.fast5
│ └─ A_02.fast5
├─ Sample_B
│ ├─ B_01.pod5
│ ├─ B_02.fast5
└─ ...
- Each subfolder is treated as a separate run or sample.
- FAST5 or POD5 files are automatically detected.
Output Directory
NanoGO creates a structured output folder:
/path/to/output
├─ temp_data
│ ├─ basecalling_model/
│ ├─ sample_sheet.csv
│ └─ sublist_# folders/ (processing chunks)
└─ final_output
├─ barcode01/
├─ barcode02/
└─ unclassified/
- temp_data: Intermediate files, logs, partial BAM/FASTQ outputs
- final_output: Fully demultiplexed and basecalled reads, separated by barcode
File Naming Convention
{flow_cell_id}_{run_id}_{model_hash}_{kit_hash}_{file_count}.fastq
- flow_cell_id – From ONT metadata
- run_id – First 8 characters of run identifier
- model_hash – Short hash of the Dorado model used
- kit_hash – Short hash identifying the barcoding kit
- file_count – Incremental count to avoid conflicts
This scheme ensures clarity, uniqueness, and traceability of all output files.
Troubleshooting
-
Installation Issues
- Compiler errors:
conda install -c conda-forge gcc_linux-64 gxx_linux-64 - Missing Dorado: Run
nanogo install-doradoor manually install. Check your PATH.
- Compiler errors:
-
Runtime Errors
- No CUDA device: Ensure NVIDIA drivers are installed, or use
--device cpu. - Memory errors: Lower chunk size or increase system RAM.
- PySAM wheel issues: Try
pip install --only-binary=:all: pysam.
- No CUDA device: Ensure NVIDIA drivers are installed, or use
-
Path or Permission Problems
- Use
--useror run with appropriate permissions (sudo) if installing system-wide. - Update your
PATHandLD_LIBRARY_PATHif installing Dorado manually.
- Use
License
NanoGO Basecaller is distributed under the GNU General Public License v3.0. Refer to the GNU GPL v3.0 for the full terms and conditions.
Support and Contact
- Primary Contact: Gurasis Osahan, National Microbiology Laboratory
- Issue Tracking: Use the GitHub Issues page for bug reports or feature requests
- Documentation: Additional references and usage examples are in the
docs/directory
Maintained by the National Microbiology Laboratory, Public Health Agency of Canada.
Ensuring public health through advanced genomics.
Thank you for using NanoGO Basecaller!
We continuously improve our tools to deliver efficient and robust ONT data processing. Feel free to reach out with any feedback or suggestions.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nanogo_basecaller-0.1.7.tar.gz.
File metadata
- Download URL: nanogo_basecaller-0.1.7.tar.gz
- Upload date:
- Size: 82.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
688bd57e6540053c6b54ecf022ef0d871fbcf701d5ae191941f131db2fd2782a
|
|
| MD5 |
1f44ae345d56aa8ed765ecefe88d1750
|
|
| BLAKE2b-256 |
36d9d1ae0296c853e565dd5e7142343cf9c8268a320a866afc8bac375538db8c
|
File details
Details for the file nanogo_basecaller-0.1.7-py3-none-any.whl.
File metadata
- Download URL: nanogo_basecaller-0.1.7-py3-none-any.whl
- Upload date:
- Size: 90.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b1b1e71d7bbc6e6c10f5933c0db8efd81bfaf9985365573ddb527a673ca08aea
|
|
| MD5 |
634e31a6903d0e42a4082453acb80609
|
|
| BLAKE2b-256 |
b059a5feffa6e0de72d4e8b63e407d188e5db2e227fcc115847b71fcdb18790e
|