A bioinformatics pipeline for basecalling ONT NGS Data.

These details have not been verified by PyPI

Project links

Project description

NanoGO Basecaller

NanoGo Logo

Oxford Nanopore Data Processing Made Simple

Build Status Coverage Python Versions PyPI Downloads License

Overview

NanoGO Basecaller is a specialized command-line tool designed for efficient processing of Oxford Nanopore Technologies (ONT) sequencing data. It integrates seamlessly with Dorado, ONT’s latest high-performance basecalling software, and supports both standard and duplex basecalling. Whether you are a seasoned bioinformatician or a newcomer to ONT data, NanoGO Basecaller offers:

Simple setup: Automatic installation of Dorado and other dependencies
High-performance basecalling: GPU acceleration and multi-threading support
Flexible workflows: Interactive and scripted command-line modes
Demultiplexing: On-the-fly barcode separation with intuitive output organization

Quick Start

Set up a Conda environment (Recommended):

conda create -n nanogo-basecaller "python=3.10" -y
conda activate nanogo-basecaller

Install NanoGO Basecaller:
```
pip install nanogo-basecaller
```
Install Dorado (if not already installed):
```
nanogo install-dorado  # Automatically handles latest version
```
Use --user to install locally without sudo, or --force to overwrite an existing installation.
Run the basecaller:
```
nanogo basecaller -i /path/to/FAST5_or_POD5 -o /path/to/output
```
You will be guided through model selection, demultiplexing, and hardware preferences.

That’s all it takes to start basecalling ONT data with NanoGO!

NanoGO Basecaller

Key Features

Dorado Integration
Automatically installs and manages Dorado (v0.6.0+) for standard or duplex basecalling. No manual setup required.
Fast and Scalable
Utilizes GPU acceleration and automatic resource detection to speed up large datasets. Supports multi-GPU configurations.
Interactive or Scripted
A single command-line interface supports either guided (interactive) runs or scripted executions for automation.
On-the-Fly Demultiplexing
Built-in barcode detection to produce organized output directories by barcode. Unclassified reads are also tracked.
Robust Conversion and Management
Automatically converts FAST5 to POD5 format. Handles model selection, model downloads, and version checks.
Error Handling
Offers comprehensive logging, graceful recovery, and easy debugging through user-friendly error messages.

Detailed Installation

System Requirements

Python: 3.8 to 3.10 (3.11+ is currently unsupported)
Operating System: Linux or Windows Subsystem for Linux (WSL 2)
CPU: Minimum 4 cores recommended
RAM: 16GB+ recommended (32GB+ for larger datasets)
Storage: SSD recommended for high I/O workloads
GPU (Optional but Recommended):
- NVIDIA GPU with CUDA support for best performance
- Supports multiple GPUs for parallel jobs

Step-by-Step Installation

Create a Conda environment:

conda create -n nanogo-basecaller "python=3.10" -y
conda activate nanogo-basecaller

Install NanoGO Basecaller using pip:
```
pip install nanogo-basecaller
```
Verify the NanoGO CLI:
```
nanogo --help
```
This should display help text and list available subcommands.

Installing from Source

If you prefer to work with the latest source code:

Clone the repository:

git clone https://github.com/phac-nml/nanogo-basecaller.git
cd nanogo-basecaller

Install in development mode:
```
pip install -e .
```

Installing Dorado

1. Using the Built-In Installer (Recommended)

nanogo install-dorado

Detects existing installations
Downloads and verifies the latest Dorado version
Installs to your virtual environment or system path

Use --user to install locally (no sudo) or --force to overwrite any existing installation.

2. Manual Installation

If automatic installation fails:

Download Dorado from Oxford Nanopore’s CDN.
Extract the tarball:
```
tar -xzf dorado-x.y.z-linux-x64.tar.gz
```
Copy binaries and libraries to a directory in your PATH (e.g., ~/.local/bin and ~/.local/lib).
Add these directories to your system PATH or LD_LIBRARY_PATH as needed.

Usage

Interactive Mode

Run the basecaller without any arguments to enter interactive mode:

nanogo basecaller

You will be prompted to:

Select or confirm the input directory
Choose a Dorado basecalling model
Specify demultiplexing options (if relevant)
Set GPU or CPU usage (auto-detected by default)

Command-Line Mode

For direct or automated executions:

nanogo basecaller -i /path/to/reads -o /path/to/output [options]

Examples:

# Use GPU (auto-detected), standard basecalling
nanogo basecaller -i data/ -o results/

# Enable duplex basecalling
nanogo basecaller -i data/ -o results/ --duplex

Workflow

Version Check – Verifies Dorado and POD5 versions.
Input Scanning – Locates FAST5/POD5 files.
Configuration – Selects basecalling model, sets up demultiplexing.
Preparation – Converts FAST5 to POD5 if necessary; downloads models.
Basecalling – Executes Dorado (standard/duplex) generating BAM or FASTQ files.
Demultiplexing – Splits reads by barcode into subfolders.
Output Structuring – Moves final outputs into well-defined directory tree.

Command-Line Options

Basecalling Options

-b, --basecaller – Enable specifying basecaller software (Dorado enabled by default).
-d, --duplex – Activates duplex basecalling mode.
-m, --model <model_name> – Manually specify a Dorado model.
--ignore <pattern> – Skip files matching the pattern (e.g., _failed.pod5).

Device Options

--device {auto,cpu,gpu} – Select processing device (default: auto-detect).
--gpu-device <ID> – Specify which GPU device to use (default: 0).

Advanced Options

--check-version – Check for the latest Dorado version (default: enabled).
--threads <N> – Specify number of CPU threads (default: auto-detect).
--chunk-size <SIZE> – Control chunking for basecalling.
--modified-bases – Enable modified base detection (requires a compatible model).

Input and Output Structure

Input Directory

NanoGO expects an organized directory with raw ONT data:

/path/to/reads
├─ Sample_A
│  ├─ A_01.fast5
│  └─ A_02.fast5
├─ Sample_B
│  ├─ B_01.pod5
│  ├─ B_02.fast5
└─ ...

Each subfolder is treated as a separate run or sample.
FAST5 or POD5 files are automatically detected.

Output Directory

NanoGO creates a structured output folder:

/path/to/output
├─ temp_data
│  ├─ basecalling_model/
│  ├─ sample_sheet.csv
│  └─ sublist_# folders/ (processing chunks)
└─ final_output
   ├─ barcode01/
   ├─ barcode02/
   └─ unclassified/

temp_data: Intermediate files, logs, partial BAM/FASTQ outputs
final_output: Fully demultiplexed and basecalled reads, separated by barcode

File Naming Convention

{flow_cell_id}_{run_id}_{model_hash}_{kit_hash}_{file_count}.fastq

flow_cell_id – From ONT metadata
run_id – First 8 characters of run identifier
model_hash – Short hash of the Dorado model used
kit_hash – Short hash identifying the barcoding kit
file_count – Incremental count to avoid conflicts

This scheme ensures clarity, uniqueness, and traceability of all output files.

Troubleshooting

Installation Issues
- Compiler errors: conda install -c conda-forge gcc_linux-64 gxx_linux-64
- Missing Dorado: Run nanogo install-dorado or manually install. Check your PATH.
Runtime Errors
- No CUDA device: Ensure NVIDIA drivers are installed, or use --device cpu.
- Memory errors: Lower chunk size or increase system RAM.
- PySAM wheel issues: Try pip install --only-binary=:all: pysam.
Path or Permission Problems
- Use --user or run with appropriate permissions (sudo) if installing system-wide.
- Update your PATH and LD_LIBRARY_PATH if installing Dorado manually.

License

NanoGO Basecaller is distributed under the GNU General Public License v3.0. Refer to the GNU GPL v3.0 for the full terms and conditions.

Support and Contact

Primary Contact: Gurasis Osahan, National Microbiology Laboratory
Issue Tracking: Use the GitHub Issues page for bug reports or feature requests
Documentation: Additional references and usage examples are in the docs/ directory

Maintained by the National Microbiology Laboratory, Public Health Agency of Canada.
Ensuring public health through advanced genomics.

Thank you for using NanoGO Basecaller!
We continuously improve our tools to deliver efficient and robust ONT data processing. Feel free to reach out with any feedback or suggestions.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.7

May 7, 2025

0.1.6

Apr 7, 2025

0.1.5

Mar 31, 2025

0.1.4

Mar 31, 2025

0.1.3

Mar 19, 2025

0.1.2

Mar 19, 2025

0.1.1

Mar 19, 2025

0.1.0

Mar 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nanogo_basecaller-0.1.7.tar.gz (82.8 kB view details)

Uploaded May 7, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nanogo_basecaller-0.1.7-py3-none-any.whl (90.9 kB view details)

Uploaded May 7, 2025 Python 3

File details

Details for the file nanogo_basecaller-0.1.7.tar.gz.

File metadata

Download URL: nanogo_basecaller-0.1.7.tar.gz
Upload date: May 7, 2025
Size: 82.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for nanogo_basecaller-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`688bd57e6540053c6b54ecf022ef0d871fbcf701d5ae191941f131db2fd2782a`
MD5	`1f44ae345d56aa8ed765ecefe88d1750`
BLAKE2b-256	`36d9d1ae0296c853e565dd5e7142343cf9c8268a320a866afc8bac375538db8c`

See more details on using hashes here.

File details

Details for the file nanogo_basecaller-0.1.7-py3-none-any.whl.

File metadata

Download URL: nanogo_basecaller-0.1.7-py3-none-any.whl
Upload date: May 7, 2025
Size: 90.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for nanogo_basecaller-0.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b1b1e71d7bbc6e6c10f5933c0db8efd81bfaf9985365573ddb527a673ca08aea`
MD5	`634e31a6903d0e42a4082453acb80609`
BLAKE2b-256	`b059a5feffa6e0de72d4e8b63e407d188e5db2e227fcc115847b71fcdb18790e`

See more details on using hashes here.

nanogo-basecaller 0.1.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

NanoGO Basecaller

Overview

Quick Start

Table of Contents

Key Features

Detailed Installation

System Requirements

Step-by-Step Installation

Installing from Source

Installing Dorado

1. Using the Built-In Installer (Recommended)

2. Manual Installation

Usage

Interactive Mode

Command-Line Mode

Workflow

Command-Line Options

Input and Output Structure

Input Directory

Output Directory

File Naming Convention

Troubleshooting

License

Support and Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes