Skip to main content

Analyze haplotypes from Illumina paired-end amplicon sequencing

Project description

CloneArmy

CloneArmy is a modern Python package for analyzing haplotypes from Illumina paired-end amplicon sequencing data. It provides a streamlined workflow for processing FASTQ files, aligning reads, identifying sequence variants, and performing comparative analyses between samples.

Features

  • Fast paired-end read processing using BWA-MEM
  • Quality-based filtering of bases and alignments
  • Haplotype identification and frequency analysis
  • Statistical comparison between samples with FDR correction
  • Interactive visualization of mutation frequencies
  • Rich command-line interface with progress tracking and tabular output
  • Comprehensive HTML reports
  • Multi-threading support
  • Support for full-length sequence analysis
  • Real-time progress monitoring with progress bars
  • Exportable results in multiple formats (CSV, JSON, Excel)

Installation

pip install clonearmy

Requirements

  • Python ≥ 3.8
  • BWA (must be installed and available in PATH)
  • Samtools (must be installed and available in PATH)

Usage

Command Line Interface

Basic Analysis

# Basic usage with progress tracking
clonearmy run /path/to/fastq/directory reference.fasta

# With all options
clonearmy run /path/to/fastq/directory reference.fasta \
    --threads 8 \
    --output results \
    --min-base-quality 20 \
    --min-mapping-quality 30 \
    --format [csv|json|excel] \  # Output format selection
    --no-report  # Skip HTML report generation

Comparative Analysis

# Compare two samples
clonearmy compare \
    /path/to/sample1/fastq \
    /path/to/sample2/fastq \
    reference.fasta \
    --threads 8 \
    --output comparison_results \
    --min-base-quality 20 \
    --min-mapping-quality 30 \
    --format [csv|json|excel] \  # Output format selection
    --full-length-only  # Only consider full-length sequences

Output Examples

Sample Analysis Results

╒════════════════╤══════════╤════════════╤══════════════╕
│ Haplotype      │ Count    │ Frequency  │ Mutations    │
╞════════════════╪══════════╪════════════╪══════════════╡
│ ATCG...        │ 1000     │ 0.45       │ 2           │
│ ATTG...        │ 800      │ 0.36       │ 1           │
│ ATCC...        │ 420      │ 0.19       │ 3           │
╘════════════════╧══════════╧════════════╧══════════════╛

Comparative Analysis Results

╒══════════╤════════════╤════════════╤═══════════╤═══════════╕
│ Position │ Sample 1 % │ Sample 2 % │ P-value   │ FDR       │
╞══════════╪════════════╪════════════╪═══════════╪═══════════╡
│ 123 A>T  │ 45.2      │ 12.3       │ 0.001     │ 0.003     │
│ 456 G>C  │ 33.1      │ 28.9       │ 0.042     │ 0.063     │
╘══════════╧════════════╧════════════╧═══════════╧═══════════╛

Python API

from pathlib import Path
from clone_army.processor import AmpliconProcessor
from clone_army.comparison import run_comparative_analysis

# Initialize processor with progress tracking
processor = AmpliconProcessor(
    reference_path="reference.fasta",
    min_base_quality=20,
    min_mapping_quality=30,
    show_progress=True  # Enable progress bars
)

# Process samples
results1 = processor.process_sample(
    fastq_r1="sample1_R1.fastq.gz",
    fastq_r2="sample1_R2.fastq.gz",
    output_dir="results/sample1",
    threads=4,
    output_format="csv"  # or "json" or "excel"
)

results2 = processor.process_sample(
    fastq_r1="sample2_R1.fastq.gz",
    fastq_r2="sample2_R2.fastq.gz",
    output_dir="results/sample2",
    threads=4,
    output_format="csv"
)

# Perform comparative analysis
comparison_results = run_comparative_analysis(
    results1={"sample1": results1},
    results2={"sample2": results2},
    reference_seq="ATCG...",  # Reference sequence string
    output_path="comparison_results.csv",
    full_length_only=False,
    show_progress=True  # Enable progress tracking
)

# Results are returned as pandas DataFrames
print(results1.to_markdown())  # Pretty print sample 1 haplotypes
print(comparison_results.to_markdown())  # Pretty print comparison

Output Files

Single Sample Analysis

  • Sorted BAM file with alignments
  • Results file in chosen format (CSV/JSON/Excel) containing:
    • Sequence
    • Read count
    • Frequency
    • Number of mutations
    • Full-length status
    • Quality metrics
  • Interactive HTML report (optional)
  • Console output with summary statistics and progress bars

Comparative Analysis

  • Results file in chosen format with statistical comparisons:
    • Mutation positions and types
    • Frequencies in each sample
    • Statistical significance (p-values)
    • FDR-corrected p-values
    • Effect sizes
  • Interactive HTML plot showing mutation frequency differences
  • Console output with significant mutations in tabular format
  • Progress tracking for long-running operations

License

MIT License - See LICENSE file for details

Citation

If you use CloneArmy in your research, please cite: [Citation information to be added]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clonearmy-0.2.3.tar.gz (20.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clonearmy-0.2.3-py3-none-any.whl (21.4 kB view details)

Uploaded Python 3

File details

Details for the file clonearmy-0.2.3.tar.gz.

File metadata

  • Download URL: clonearmy-0.2.3.tar.gz
  • Upload date:
  • Size: 20.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.2

File hashes

Hashes for clonearmy-0.2.3.tar.gz
Algorithm Hash digest
SHA256 05fad89e4213b4bac3ada14149c80ba4d6fc8c3d5fb073bbf8eab42140a596f3
MD5 a64f5d51cc2be42a5116a759dc40ac17
BLAKE2b-256 5c2c4097d5398b96e186c0a705657f840d558d28575ec21f6d8951580879ec1b

See more details on using hashes here.

File details

Details for the file clonearmy-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: clonearmy-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 21.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.2

File hashes

Hashes for clonearmy-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 06bbb7551b27a2a35b2c1e14196db6e8f5dc30c5abe5036cac74de71153ab6af
MD5 dcd4dd2593619c83537938be01cb8fca
BLAKE2b-256 6cd51201d037e259852d97c3ce127c8c6ac7c72214bcab8d062ff364c81b3e74

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page