Skip to main content

Analyze haplotypes from Illumina paired-end amplicon sequencing

Project description

CloneArmy

CloneArmy is a modern Python package for analyzing haplotypes from Illumina paired-end amplicon sequencing data. It provides a streamlined workflow for processing FASTQ files, aligning reads, identifying sequence variants, and performing comparative analyses between samples.

Features

  • Fast paired-end read processing using BWA-MEM
  • Quality-based filtering of bases and alignments
  • Haplotype identification and frequency analysis
  • Statistical comparison between samples with FDR correction
  • Interactive visualization of mutation frequencies
  • Rich command-line interface with progress tracking and tabular output
  • Comprehensive HTML reports
  • Multi-threading support
  • Support for full-length sequence analysis
  • Real-time progress monitoring with progress bars
  • Exportable results in multiple formats (CSV, JSON, Excel)

Installation

pip install clone-army

Requirements

  • Python ≥ 3.8
  • BWA (must be installed and available in PATH)
  • Samtools (must be installed and available in PATH)

Usage

Command Line Interface

Basic Analysis

# Basic usage with progress tracking
clone-army run /path/to/fastq/directory reference.fasta

# With all options
clone-army run /path/to/fastq/directory reference.fasta \
    --threads 8 \
    --output results \
    --min-base-quality 20 \
    --min-mapping-quality 30 \
    --format [csv|json|excel] \  # Output format selection
    --no-report  # Skip HTML report generation

Comparative Analysis

# Compare two samples
clone-army compare \
    /path/to/sample1/fastq \
    /path/to/sample2/fastq \
    reference.fasta \
    --threads 8 \
    --output comparison_results \
    --min-base-quality 20 \
    --min-mapping-quality 30 \
    --format [csv|json|excel] \  # Output format selection
    --full-length-only  # Only consider full-length sequences

Output Examples

Sample Analysis Results

╒════════════════╤══════════╤════════════╤══════════════╕
│ Haplotype      │ Count    │ Frequency  │ Mutations    │
╞════════════════╪══════════╪════════════╪══════════════╡
│ ATCG...        │ 1000     │ 0.45       │ 2           │
│ ATTG...        │ 800      │ 0.36       │ 1           │
│ ATCC...        │ 420      │ 0.19       │ 3           │
╘════════════════╧══════════╧════════════╧══════════════╛

Comparative Analysis Results

╒══════════╤════════════╤════════════╤═══════════╤═══════════╕
│ Position │ Sample 1 % │ Sample 2 % │ P-value   │ FDR       │
╞══════════╪════════════╪════════════╪═══════════╪═══════════╡
│ 123 A>T  │ 45.2      │ 12.3       │ 0.001     │ 0.003     │
│ 456 G>C  │ 33.1      │ 28.9       │ 0.042     │ 0.063     │
╘══════════╧════════════╧════════════╧═══════════╧═══════════╛

Python API

from pathlib import Path
from clone_army.processor import AmpliconProcessor
from clone_army.comparison import run_comparative_analysis

# Initialize processor with progress tracking
processor = AmpliconProcessor(
    reference_path="reference.fasta",
    min_base_quality=20,
    min_mapping_quality=30,
    show_progress=True  # Enable progress bars
)

# Process samples
results1 = processor.process_sample(
    fastq_r1="sample1_R1.fastq.gz",
    fastq_r2="sample1_R2.fastq.gz",
    output_dir="results/sample1",
    threads=4,
    output_format="csv"  # or "json" or "excel"
)

results2 = processor.process_sample(
    fastq_r1="sample2_R1.fastq.gz",
    fastq_r2="sample2_R2.fastq.gz",
    output_dir="results/sample2",
    threads=4,
    output_format="csv"
)

# Perform comparative analysis
comparison_results = run_comparative_analysis(
    results1={"sample1": results1},
    results2={"sample2": results2},
    reference_seq="ATCG...",  # Reference sequence string
    output_path="comparison_results.csv",
    full_length_only=False,
    show_progress=True  # Enable progress tracking
)

# Results are returned as pandas DataFrames
print(results1.to_markdown())  # Pretty print sample 1 haplotypes
print(comparison_results.to_markdown())  # Pretty print comparison

Output Files

Single Sample Analysis

  • Sorted BAM file with alignments
  • Results file in chosen format (CSV/JSON/Excel) containing:
    • Sequence
    • Read count
    • Frequency
    • Number of mutations
    • Full-length status
    • Quality metrics
  • Interactive HTML report (optional)
  • Console output with summary statistics and progress bars

Comparative Analysis

  • Results file in chosen format with statistical comparisons:
    • Mutation positions and types
    • Frequencies in each sample
    • Statistical significance (p-values)
    • FDR-corrected p-values
    • Effect sizes
  • Interactive HTML plot showing mutation frequency differences
  • Console output with significant mutations in tabular format
  • Progress tracking for long-running operations

License

MIT License - See LICENSE file for details

Citation

If you use CloneArmy in your research, please cite: [Citation information to be added]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clonearmy-0.2.2.tar.gz (20.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clonearmy-0.2.2-py3-none-any.whl (21.2 kB view details)

Uploaded Python 3

File details

Details for the file clonearmy-0.2.2.tar.gz.

File metadata

  • Download URL: clonearmy-0.2.2.tar.gz
  • Upload date:
  • Size: 20.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.2

File hashes

Hashes for clonearmy-0.2.2.tar.gz
Algorithm Hash digest
SHA256 80b756d6d99281bd29de04867426a6f0c6f53e63c2fe8dd883033533442f9910
MD5 4db8e117dd7d4723a3adcfd80eaacf36
BLAKE2b-256 4b206da2e58701bb29d1598a4e1bd8285d40e884ea4373d7f2c30bdbd66c9ddf

See more details on using hashes here.

File details

Details for the file clonearmy-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: clonearmy-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 21.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.2

File hashes

Hashes for clonearmy-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b58eb7696e1dced5c3bb3922ba73c617ea53fd23a6416738ea8359f58c4ff6e4
MD5 ba2a5b7713e7dc0982c4edf8d84a835c
BLAKE2b-256 a8230c3138fce8dc1dc439b7775c409b0575acfdb82e9b465dadb46fd5684641

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page