Comprehensive genotyping tool for targeted long-read sequencing analysis

These details have not been verified by PyPI

Project links

Homepage

Project description

DAJIN2 is a genotyping tool for genome-edited samples using nanopore-targeted sequencing.

DAJIN2 takes its name from the Japanese phrase 一網打尽 (Ichimou DAJIN in Japanese; Yīwǎng Dǎjìn in Chinese),
meaning “to capture everything in a single sweep.”
This reflects the tool’s design philosophy: comprehensive detection of both intended and unintended genome editing outcomes in one analysis.

🌟 Features

Comprehensive Mutation Detection
DAJIN2 can detect a wide range of genome editing events in nanopore-targeted regions, from point mutations to structural variants.
It is particularly effective at identifying unexpected mutations and complex mutations, such as insertions within deleted regions.
Highly Sensitive Allele Classification
Supports classification of mosaic alleles, capable of detecting minor alleles present at approximately 1%.
Intuitive Visualization
Genome editing results are visualized in an intuitive manner, enabling rapid and easy identification of mutations.
Multi-Sample Support
Batch processing of multiple samples is supported, allowing efficient execution of large-scale experiments and comparative studies.
Simple Installation and Operation
Requires no specialized computing environment and runs smoothly on a standard laptop.
Easily installable via Bioconda or PyPI, and usable via the command line.

🛠 Installation

System Requirements

Hardware

Runs on a standard laptop
Recommended memory: 16 GB or more

[!NOTE] DAJIN2 is the successor to DAJIN, which required a GPU for efficient computation due to its use of deep learning.
In contrast, DAJIN2 does not use deep learning and does not require a GPU.
Therefore, it runs smoothly on typical laptops.

Software

Python 3.10-3.12
Unix-based environment (Linux, macOS, WSL2, etc.)

[!IMPORTANT] For Windows Users
DAJIN2 is designed to run in a Linux environment.
If you are using Windows, please use WSL2 (Windows Subsystem for Linux 2).

From Bioconda (Recommended)

# Setting up Bioconda
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority flexible

# Install DAJIN2
conda create -n env-dajin2 python=3.12 DAJIN2 -y
conda activate env-dajin2

From PyPI

pip install DAJIN2

[!IMPORTANT] DAJIN2 is actively being developed and improved.
Please make sure you are using the latest version to take advantage of the newest features.

🔍 To check your current version:
DAJIN2 --version
➡️ Check the latest version:
https://github.com/akikuno/DAJIN2/releases

🔄 To update to the latest version:
conda update DAJIN2 -y
or
pip install -U DAJIN2

[!CAUTION] If you encounter any issues during the installation, please refer to the Troubleshooting Guide

💻 Usage

Required Files

1. FASTQ/FASTA/BAM Files for Sample and Control

In DAJIN2, a control that has not undergone genome editing is necessary to detect genome-editing-specific mutations. Specify a directory containing the FASTQ/FASTA (both gzip compressed and uncompressed) or BAM files of the genome editing sample and control.

Basecalling with Dorado

For basecalling with Dorado (dorado demux), the following file structure will be output:

bam_pass
├── barcode01
│   └── EXP-PBC096_barcode01.bam
├── barcode02
│   └── EXP-PBC096_barcode02.bam
├── ...
└── unclassified
│   └── EXP-PBC096_unclassified.bam

[!IMPORTANT] Store each BAM file in a separate directory. The directory names can be set arbitrarily.

Similarly, store the FASTA files outputted after sequence error correction with dorado correct in separate directories.

dorado_correct
├── barcode01
│   └── EXP-PBC096_barcode01.fasta
└── barcode02
    └── EXP-PBC096_barcode02.fasta

[!NOTE] For detailed Dorado usage, see DORADO_HANDLING.md.

Basecalling with Guppy

After basecalling with Guppy, the following file structure will be output:

fastq_pass
├── barcode01
│   ├── fastq_runid_b347657c88dced2d15bf90ee6a1112a3ae91c1af_0_0.fastq.gz
│   ├── fastq_runid_b347657c88dced2d15bf90ee6a1112a3ae91c1af_10_0.fastq.gz
│   └── fastq_runid_b347657c88dced2d15bf90ee6a1112a3ae91c1af_11_0.fastq.gz
└── barcode02
    ├── fastq_runid_b347657c88dced2d15bf90ee6a1112a3ae91c1af_0_0.fastq.gz
    ├── fastq_runid_b347657c88dced2d15bf90ee6a1112a3ae91c1af_10_0.fastq.gz
    └── fastq_runid_b347657c88dced2d15bf90ee6a1112a3ae91c1af_11_0.fastq.gz

[!CAUTION] Although DAJIN2 can process Guppy-generated data, Guppy is no longer supported by Oxford Nanopore Technologies.
Please use Dorado for basecalling and demultiplexing.

2. FASTA File Including Anticipated Allele Sequences

The FASTA file should contain descriptions of the alleles anticipated as a result of genome editing.

[!IMPORTANT] A header name >control and its sequence are necessary.

If there are anticipated alleles (e.g., knock-ins or knock-outs), include their sequences in the FASTA file too. These anticipated alleles can be named arbitrarily.

Below is an example of a FASTA file:

>control
ACGTACGTACGTACGT
>knock-in
ACGTACGTCCCCACGTACGT
>knock-out
ACGTACGT

Here, >control represents the sequence of the control allele, while >knock-in and >knock-out represent the sequences of the anticipated knock-in and knock-out alleles, respectively.

[!IMPORTANT] Ensure that both ends of the FASTA sequence match those of the amplicon sequence.
If the FASTA sequence is longer or shorter than the amplicon, the difference may be recognized as an indel.

Single Sample Analysis

DAJIN2 supports single-sample analysis (one sample vs one control).

DAJIN2 <-c|--control> <-s|--sample> <-a|--allele> <-n|--name> \
  [-g|--genome] [-b|--bed] [-t|--threads] [--no-filter] [-h|--help] [-v|--version]

Options:
-c, --control            Specify the path to the directory containing control FASTQ/FASTA/BAM files.
-s, --sample             Specify the path to the directory containing sample FASTQ/FASTA/BAM files.
-a, --allele             Specify the path to the FASTA file.
-n, --name (Optional)    Set the output directory name. Default: 'Results'.
-b, --bed (Optional)     Specify the path to BED6 file containing genomic coordinates. Default: '' (empty string).
-g, --genome (Optional)  Specify the reference UCSC genome ID (e.g., hg38, mm39). Default: '' (empty string).
-t, --threads (Optional) Set the number of threads. Default: 1.
--no-filter (Optional)   Disable minor allele filtering (keep alleles below 0.5%). Default: False.
-h, --help               Display this help message and exit.
-v, --version            Display the version number and exit.

Example

# Download the example dataset
curl -LJO https://github.com/akikuno/DAJIN2/raw/main/examples/example_single.tar.gz
tar -xf example_single.tar.gz

# Run DAJIN2
DAJIN2 \
    --control example_single/control \
    --sample example_single/sample \
    --allele example_single/stx2_deletion.fa \
    --name stx2_deletion \
    --bed example_single/stx2_deletion.bed \
    --threads 4

Using BED Files for Genomic Coordinates

If the reference genome is not from UCSC, or if the external servers that DAJIN2 depends on (UCSC Genome Browser and GGGenome) are unavailable, you can specify a BED file using the -b/--bed option to run offline.

[!IMPORTANT] Access to the UCSC Genome Browser or GGGenome servers may occasionally be unavailable. Therefore, we generally recommend using -b/--bed instead of --genome.

When using the -b/--bed option with a BED file, please ensure:

Use BED6 format (6 columns required):

chr1    1000000    1001000    mm39    248956422    +

Column descriptions:

Column 1: Chromosome name (e.g., chr1, chr2)
Column 2: Start position (0-indexed)
Column 3: End position (0-indexed)
Column 4: Name (genome ID)
Column 5: Score (chromosome size for proper IGV visualization)
Column 6: Strand (+ or -, must match FASTA allele orientation)

[!NOTE]
For the score field (column 5), please enter the size of the chromosome specified in column 1.
While the original BED format limits scores to 1000, DAJIN2 accepts chromosome sizes without any issue.

[!NOTE] Chromosome sizes can be found at:
https://hgdownload.soe.ucsc.edu/goldenPath/[genome]/bigZips/[genome].chrom.sizes
(e.g., https://hgdownload.soe.ucsc.edu/goldenPath/mm39/bigZips/mm39.chrom.sizes)

[!IMPORTANT]
Strand orientation must match. The strand field (column 6: + or -) in your BED file must match the strand orientation of your FASTA allele sequences.

If your FASTA allele sequence is on the forward strand (5' to 3'), use + in the BED file

If your FASTA allele sequence is on the reverse strand (3' to 5'), use - in the BED file

[!NOTE] For detailed BED file usage, see BED_COORDINATE_USAGE.md.

Rare Mutation Detection with `--no-filter`

By default, DAJIN2 filters out alleles with read counts below 0.5% (5 reads out of 100,000 downsampled reads) to reduce noise and improve accuracy. However, when analyzing rare mutations or somatic mosaicism where minor alleles may be present at very low frequencies, you can use the --no-filter option to disable this filtering.

When to use --no-filter:

Detecting rare somatic mutations (< 0.5% frequency)
Analyzing samples with suspected low-level mosaicism
Research requiring detection of all possible alleles regardless of frequency

Usage:

DAJIN2 \
    --control example_single/control \
    --sample example_single/sample \
    --allele example_single/stx2_deletion.fa \
    --name stx2_deletion \
    --bed example_single/stx2_deletion.bed \
    --threads 4 \
    --no-filter

[!CAUTION] Using --no-filter may increase noise and false positives in the results. It is recommended to validate rare alleles through additional experimental methods.

Batch Processing

By using the batch subcommand, you can process multiple samples simultaneously.
For this purpose, a CSV or Excel file consolidating the sample information is required.

[!NOTE] For guidance on how to compile sample information, please refer to this document.

Required columns: sample, control, allele, name
Optional columns: genome, bed (or genome_coordinate), and any custom columns

Example CSV with BED files:

sample,control,allele,name,bed
/path/to/sample1,/path/to/control1,/path/to/allele1.fa,experiment1,/path/to/coords1.bed
/path/to/sample2,/path/to/control2,/path/to/allele2.fa,experiment2,/path/to/coords2.bed

[!TIP] It is recommended to use the same value in the name column for samples that belong to the same experiment.
Using identical names enables parallel processing, thereby improving efficiency.
Here's an example 👉 batch.csv

DAJIN2 batch <-f|--file> [-t|--threads] [--no-filter] [-h]

Options:
  -f, --file                Specify the path to the CSV or Excel file.
  -t, --threads (Optional)  Set the number of threads. Default: 1.
  --no-filter (Optional)    Disable minor allele filtering (keep alleles below 0.5%). Default: False.
  -h, --help                Display this help message and exit.

Example

# Download the example dataset
curl -LJO https://github.com/akikuno/DAJIN2/raw/main/examples/example_batch.tar.gz
tar -xf example_batch.tar.gz

# Run DAJIN2 batch
DAJIN2 batch --file example_batch/batch.csv --threads 4

GUI (Graphical User Interface) Mode

DAJIN2 provides a web interface that can be launched with a single command:

DAJIN2 gui

When executed, your default web browser will open and display the following GUI at http://localhost:{PORT}/.

[!NOTE] If the browser does not launch automatically, please open your browser manually and navigate to http://localhost:{PORT}/.

Single Sample Analysis via GUI

Launch GUI
Run DAJIN2 gui to open the web interface.
Project Setup
- Project Name: Enter any analysis name
- Directory Upload: Select directories containing sample or control FASTQ/FASTA/BAM files
- Allele FASTA: Upload FASTA file containing expected allele sequences
- BED File (optional): Upload BED6 format file to specify genomic coordinates
Parameter Configuration
- Reference Genome (optional): Specify UCSC genome ID (e.g., hg38, mm39)
- Threads: Set the number of CPU threads to use
- No Filter: Enable to detect rare mutations below 0.5% frequency
Run Analysis
Click "Start Analysis" and the progress will be displayed in real-time.
View Results
After completion, the output folder path will be displayed for accessing result files.

Batch Processing via GUI

Prepare Batch File
Create a CSV or Excel file with columns: sample, control, allele, name.
Upload Batch File
Use the "Batch Processing" tab to upload your configuration file.
Configure Global Settings
Set threads and filtering options for all samples at once.
Monitor Progress
The analysis status for each sample is displayed with detailed log output.
View Results
Results are saved in the DAJIN_Results/ folder with subdirectories for each sample.

📈 Reports

Upon completion of DAJIN2 processing, a directory named DAJIN_Results/{NAME} is generated.
Inside the DAJIN_Results/{NAME} directory, the following files can be found:

DAJIN_Results/tyr-substitution
├── BAM
│   ├── control
│   ├── tyr_c230gt_01
│   ├── tyr_c230gt_10
│   └── tyr_c230gt_50
├── DAJIN2_log_20260127_140954_076887.txt
├── FASTA
│   ├── tyr_c230gt_01
│   ├── tyr_c230gt_10
│   └── tyr_c230gt_50
├── HTML
│   ├── tyr_c230gt_01
│   ├── tyr_c230gt_10
│   └── tyr_c230gt_50
├── MUTATION_INFO
│   ├── tyr_c230gt_01.csv
│   ├── tyr_c230gt_10.csv
│   └── tyr_c230gt_50.csv
├── VCF
│   ├── tyr_c230gt_01
│   ├── tyr_c230gt_10
│   └── tyr_c230gt_50
├── launch_report_mac.command
├── launch_report_windows.bat
└── read_summary.xlsx

1. launch_report_windows.bat / launch_report_mac.command

On Windows, double-click launch_report_windows.bat.
On macOS, double-click launch_report_mac.command.
Your browser will open and display the report.

Demo video:

https://github.com/user-attachments/assets/e2de7b56-94c8-4361-a9d3-54c30d53720c

[!TIP] Clicking on an allele of interest in the stacked bar chart allows you to view detailed information on the mutation (right panel above on figure, and video).

In the report, Allele type indicates the allele category, and Percent of reads shows the proportion of reads.

Allele type categories:

{Allele name}: Alleles that perfectly match a user-defined allele in the FASTA file
{Allele name} with indels: Alleles similar to a user-defined allele but with a few-base substitution, deletion, insertion, or inversion
unassigned insertion/deletion/inversion: Alleles with deletions, insertions, or inversions of 10 bases or more that are not defined by the user

[!WARNING]
In PCR amplicon sequencing, Percent of reads may not match the true allele proportions due to amplification bias.
This effect can be pronounced when large deletions are present, potentially distorting the actual allele ratios.

2. read_summary.xlsx

read_summary.xlsx lists the read counts and proportions for each allele.
The stacked bar chart in the report is a visualization of read_summary.xlsx.
Use it as reference when preparing figures for publications.

3. BAM and VCF

The BAM and VCF directories contain BAM and VCF files classified by allele.

[!NOTE]
If --bed or --genome is not specified, reads are aligned to the control allele in the input FASTA file.

4. FASTA and HTML

The FASTA directory stores FASTA files for each allele.
The HTML directory stores per-allele HTML files with color-highlighted mutations.
An example of a Tyr point mutation (green) is shown below:

DAJIN2 also extracts representative SV alleles (Insertion, Deletion, Inversion) in the sample and underlines SV regions.
Below is an example where a deletion (light blue) and an insertion (red) are observed at both ends of an inversion (purple underline).

5. MUTATION_INFO

The MUTATION_INFO directory stores tables describing mutation sites for each allele.
An example of a Tyr point mutation is shown below:

It lists the chromosomal position and the mutation type.

📣 Feedback and Support

We welcome your questions, bug reports, and feedback.
Please use the following Google Form to submit your report:
👉 Google Form

If you have a GitHub account, you can also submit reports via
👉 GitHub Issues

Please refer to CONTRIBUTING for how to contribute and how to verify your contributions.

[!NOTE] For frequently asked questions, please refer to this page.

🤝 Code of Conduct

Please note that this project is released with a Contributor Code of Conduct.
By participating in this project you agree to abide by its terms.

📄 References

For more information, please refer to the following publication:

Kuno A, et al. (2022) DAJIN enables multiplex genotyping to simultaneously validate intended and unintended target genome editing outcomes. PLoS Biology 20(1): e3001507.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.8.0

Jan 29, 2026

0.7.4

Dec 4, 2025

0.7.3

Oct 22, 2025

0.7.2

Sep 3, 2025

0.7.1

Jul 18, 2025

0.7.0

Jul 10, 2025

0.6.2

Jun 6, 2025

0.6.1

Mar 18, 2025

0.6.0

Feb 20, 2025

0.5.6

Dec 4, 2024

0.5.5.1

Oct 28, 2024

0.5.5

Oct 7, 2024

0.5.4

Jul 23, 2024

0.5.3

Jul 16, 2024

0.5.2

Jul 8, 2024

0.5.1

Jun 15, 2024

0.5.0

Jun 5, 2024

0.4.6

May 17, 2024

0.4.5

Apr 24, 2024

0.4.4

Apr 23, 2024

0.4.3

Mar 29, 2024

0.4.2

Mar 25, 2024

0.4.1

Feb 13, 2024

0.4.0

Jan 20, 2024

0.3.6

Jan 9, 2024

0.3.5

Dec 22, 2023

0.3.4

Dec 12, 2023

0.3.3

Nov 7, 2023

0.3.2

Oct 24, 2023

0.3.1

Aug 23, 2023

0.3.1b4 pre-release

Aug 30, 2023

0.3.0

Aug 7, 2023

0.2.4

Jun 13, 2023

0.2.3

Jun 6, 2023

0.2.2.2

Jun 6, 2023

0.2.1

Jun 5, 2023

0.2.0

Jun 5, 2023

0.1.33

Jun 3, 2023

0.1.32a0 pre-release

Jun 3, 2023

0.1.30

Apr 26, 2023

0.1.22

Nov 1, 2022

0.1.21

Oct 28, 2022

0.1.20

Oct 28, 2022

0.1.19

Oct 27, 2022

0.1.18

Oct 27, 2022

0.1.17

Oct 27, 2022

0.1.16

Oct 26, 2022

0.1.15

Oct 26, 2022

0.1.14

Oct 25, 2022

0.1.13

Oct 25, 2022

0.1.12

Oct 25, 2022

0.1.11

Oct 25, 2022

0.1.10

Oct 25, 2022

0.1.9

Oct 25, 2022

0.1.8

Oct 25, 2022

0.1.7

Oct 21, 2022

0.1.6

Oct 21, 2022

0.1.5

Oct 21, 2022

0.1.4

Oct 21, 2022

0.1.3

Oct 20, 2022

0.1.2

Oct 20, 2022

0.1.1

Oct 20, 2022

0.1.0

Oct 20, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dajin2-0.8.0.tar.gz (107.1 kB view details)

Uploaded Jan 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dajin2-0.8.0-py3-none-any.whl (130.1 kB view details)

Uploaded Jan 29, 2026 Python 3

File details

Details for the file dajin2-0.8.0.tar.gz.

File metadata

Download URL: dajin2-0.8.0.tar.gz
Upload date: Jan 29, 2026
Size: 107.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dajin2-0.8.0.tar.gz
Algorithm	Hash digest
SHA256	`f071a3bfc59b0656dc8fed9c30b90b611ca48518b7cc46a9a3029fb1cbfa9e45`
MD5	`ec318d881549b2a95b84b7ea0cb55ab9`
BLAKE2b-256	`9f6c8d99bde9047480473f07e8b55c729cca77b50d8a24847789c6bd73625ad8`

See more details on using hashes here.

File details

Details for the file dajin2-0.8.0-py3-none-any.whl.

File metadata

Download URL: dajin2-0.8.0-py3-none-any.whl
Upload date: Jan 29, 2026
Size: 130.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dajin2-0.8.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6015b3a11a8d4bb1bcd2a433cdf7743381bbded3abb23e2fee8d2a80fc10ce76`
MD5	`72464f9e7056a27f0feb4664fd5c832d`
BLAKE2b-256	`0583d5e6fda2332391f6eaabb4be0eecea00715ee19d6db83572b5216bd13b4a`

See more details on using hashes here.

DAJIN2 0.8.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🌟 Features

🛠 Installation

System Requirements

Hardware

Software

From Bioconda (Recommended)

From PyPI

💻 Usage

Required Files

1. FASTQ/FASTA/BAM Files for Sample and Control

Basecalling with Dorado

Basecalling with Guppy

2. FASTA File Including Anticipated Allele Sequences

Single Sample Analysis

Example

Using BED Files for Genomic Coordinates

Rare Mutation Detection with --no-filter

Batch Processing

Example

GUI (Graphical User Interface) Mode

Single Sample Analysis via GUI

Batch Processing via GUI

📈 Reports

1. launch_report_windows.bat / launch_report_mac.command

2. read_summary.xlsx

3. BAM and VCF

4. FASTA and HTML

5. MUTATION_INFO

📣 Feedback and Support

🤝 Code of Conduct

📄 References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Rare Mutation Detection with `--no-filter`