Analyze haplotypes from Illumina paired-end amplicon sequencing

These details have not been verified by PyPI

Project links

Project description

CloneArmy

CloneArmy is a modern Python package for analyzing haplotypes from Illumina paired-end amplicon sequencing data. It provides a streamlined workflow for processing FASTQ files, aligning reads, identifying sequence variants, and performing comparative analyses between samples.

Features

Fast paired-end read processing using BWA-MEM
Quality-based filtering of bases and alignments
Haplotype identification and frequency analysis
Statistical comparison between samples with FDR correction
Interactive visualization of mutation frequencies
Rich command-line interface with progress tracking and tabular output
Comprehensive HTML reports
Multi-threading support
Support for full-length sequence analysis
Real-time progress monitoring with progress bars
Automatic downsampling of large FASTQ files
Exportable results in multiple formats (CSV, JSON, Excel)

Installation

pip install clonearmy

Requirements

Python ≥ 3.8
BWA (must be installed and available in PATH)
Samtools (must be installed and available in PATH)
Seqtk (must be installed and available in PATH)

You can install the required tools using conda:

conda install -c bioconda bwa samtools seqtk

Usage

Command Line Interface

Basic Analysis

# Basic usage with progress tracking
clonearmy run /path/to/fastq/directory reference.fasta

# With all options
clonearmy run /path/to/fastq/directory reference.fasta \
    --threads 8 \
    --output results \
    --min-base-quality 20 \
    --min-mapping-quality 30 \
    --min-read-count 10 \
    --max-file-size 100000000 \  # Target size for downsampling (100MB)
    --report  # Generate HTML report (default: true)

The --max-file-size option specifies the target size for downsampling large FASTQ files. If your input files are larger than this size, they will be automatically downsampled while maintaining paired-end relationships. This is useful for quick testing or when working with very large datasets. The size is specified in bytes (e.g., 100000000 for 100MB).

Comparative Analysis

# Compare two samples
clonearmy compare \
    /path/to/sample1/fastq \
    /path/to/sample2/fastq \
    reference.fasta \
    --threads 8 \
    --output comparison_results \
    --min-base-quality 20 \
    --min-mapping-quality 30 \
    --min-read-count 10 \
    --max-file-size 100000000 \  # Target size for downsampling (100MB)
    --full-length-only  # Only consider full-length sequences

Output Examples

Sample Analysis Results

╒════════════════╤══════════╤════════════╤══════════════╕
│ Sample         │ Reads    │ Haplotypes │ Mutations    │
╞════════════════╪══════════╪════════════╪══════════════╡
│ sample1        │ 10000    │ 45         │ 2.3 avg      │
│ sample2        │ 12000    │ 52         │ 1.8 avg      │
╘════════════════╧══════════╧════════════╧══════════════╛

Comparative Analysis Results

╒══════════╤════════════╤════════════╤═══════════╤═══════════╕
│ Position │ Sample 1 % │ Sample 2 % │ P-value   │ FDR       │
╞══════════╪════════════╪════════════╪═══════════╪═══════════╡
│ 123 A>T  │ 45.2      │ 12.3       │ 0.001     │ 0.003     │
│ 456 G>C  │ 33.1      │ 28.9       │ 0.042     │ 0.063     │
╘══════════╧════════════╧════════════╧═══════════╧═══════════╛

Python API

from pathlib import Path
from clone_army.processor import AmpliconProcessor
from clone_army.comparison import run_comparative_analysis

# Initialize processor with automatic downsampling
processor = AmpliconProcessor(
    reference_path="reference.fasta",
    min_base_quality=20,
    min_mapping_quality=30,
    min_read_count=10,
    max_file_size=100_000_000  # 100MB target size
)

# Process samples
results1 = processor.process_sample(
    fastq_r1="sample1_R1.fastq.gz",
    fastq_r2="sample1_R2.fastq.gz",
    output_dir="results/sample1",
    threads=4
)

results2 = processor.process_sample(
    fastq_r1="sample2_R1.fastq.gz",
    fastq_r2="sample2_R2.fastq.gz",
    output_dir="results/sample2",
    threads=4
)

# Perform comparative analysis
comparison_results = run_comparative_analysis(
    results1=results1,
    results2=results2,
    reference_seq=ref_seq,
    output_path="comparison_results.csv",
    full_length_only=False
)

Output Files

Single Sample Analysis

Sorted BAM file with alignments
{sample}_haplotypes.csv containing:
- Sequence
- Read count
- Frequency
- Number of mutations
- Full-length status
- Quality metrics
Interactive HTML report with:
- Summary statistics
- Mutation frequency plots
- Position-based mutation diversity plots
- Mutation spectrum analysis
Console output with summary statistics

Comparative Analysis

comparison_results.csv with statistical comparisons:
- Mutation positions and types
- Frequencies in each sample
- Statistical significance (p-values)
- FDR-corrected p-values
Interactive HTML plots:
- Mutation frequency comparison
- Position-based mutation diversity
Console output with significant mutations

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.32

Apr 1, 2025

0.2.31

Apr 1, 2025

0.2.30

Mar 31, 2025

0.2.29

Mar 30, 2025

0.2.28

Mar 30, 2025

0.2.27

Mar 28, 2025

0.2.26

Mar 28, 2025

0.2.25

Mar 28, 2025

0.2.24

Mar 27, 2025

0.2.23

Mar 27, 2025

0.2.22

Mar 27, 2025

0.2.21

Mar 26, 2025

This version

0.2.20

Mar 26, 2025

0.2.19

Mar 26, 2025

0.2.18

Mar 25, 2025

0.2.17

Mar 24, 2025

0.2.16

Mar 20, 2025

0.2.15

Mar 20, 2025

0.2.14

Mar 20, 2025

0.2.13

Mar 20, 2025

0.2.12

Mar 20, 2025

0.2.10

Mar 20, 2025

0.2.8

Mar 20, 2025

0.2.7

Mar 20, 2025

0.2.6

Mar 20, 2025

0.2.5

Mar 20, 2025

0.2.4

Jan 24, 2025

0.2.3

Jan 22, 2025

0.2.2

Jan 22, 2025

0.2.0

Nov 24, 2024

0.1.2

Nov 24, 2024

0.1.1

Nov 22, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clonearmy-0.2.20.tar.gz (21.9 kB view details)

Uploaded Mar 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

clonearmy-0.2.20-py3-none-any.whl (25.4 kB view details)

Uploaded Mar 26, 2025 Python 3

File details

Details for the file clonearmy-0.2.20.tar.gz.

File metadata

Download URL: clonearmy-0.2.20.tar.gz
Upload date: Mar 26, 2025
Size: 21.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for clonearmy-0.2.20.tar.gz
Algorithm	Hash digest
SHA256	`624d12473af11f2e719ec936bd8b408968a0bf29512c7b661a838c4d0dab9e5e`
MD5	`370fbd66ed2d0bfc4220d8a3a6c9777b`
BLAKE2b-256	`8e7249e147d845e6c2b75e6eb76cd806e2fd1868c8103957cf7de35eb36dc160`

See more details on using hashes here.

File details

Details for the file clonearmy-0.2.20-py3-none-any.whl.

File metadata

Download URL: clonearmy-0.2.20-py3-none-any.whl
Upload date: Mar 26, 2025
Size: 25.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for clonearmy-0.2.20-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e5d33a5604c87034c830ba76e3b3a8beeb2a24fe88d98a27b3c90944300c96f3`
MD5	`172bde133032677454837ba6d2615775`
BLAKE2b-256	`a94a85034c84798d593ce3a36786ecb11426698916334461b62c243b5fdf8e4d`

See more details on using hashes here.

clonearmy 0.2.20

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CloneArmy

Features

Installation

Requirements

Usage

Command Line Interface

Basic Analysis

Comparative Analysis

Output Examples

Sample Analysis Results

Comparative Analysis Results

Python API

Output Files

Single Sample Analysis

Comparative Analysis

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes