Skip to main content

Analyze haplotypes from Illumina paired-end amplicon sequencing

Project description

CloneArmy

CloneArmy is a modern Python package for analyzing haplotypes from Illumina paired-end amplicon sequencing data. It provides a streamlined workflow for processing FASTQ files, aligning reads, identifying sequence variants, and performing comparative analyses between samples.

Features

  • Fast paired-end read processing using BWA-MEM
  • Quality-based filtering of bases and alignments
  • Haplotype identification and frequency analysis
  • Statistical comparison between samples with FDR correction
  • Interactive visualization of mutation frequencies
  • Rich command-line interface with progress tracking
  • Comprehensive HTML reports
  • Multi-threading support
  • Support for full-length sequence analysis

Installation

pip install cloneArmy

Requirements

  • Python ≥ 3.8
  • BWA (must be installed and available in PATH)
  • Samtools (must be installed and available in PATH)

Usage

Command Line Interface

Basic Analysis

# Basic usage
cloneArmy run /path/to/fastq/directory reference.fasta

# With all options
cloneArmy run /path/to/fastq/directory reference.fasta \
    --threads 8 \
    --output results \
    --min-base-quality 20 \
    --min-mapping-quality 30 \
    --no-report  # Skip HTML report generation

Comparative Analysis

# Compare two samples
cloneArmy compare \
    /path/to/sample1/fastq \
    /path/to/sample2/fastq \
    reference.fasta \
    --threads 8 \
    --output comparison_results \
    --min-base-quality 20 \
    --min-mapping-quality 30 \
    --full-length-only  # Only consider full-length sequences

Python API

from pathlib import Path
from clone_army.processor import AmpliconProcessor
from clone_army.comparison import run_comparative_analysis

# Initialize processor
processor = AmpliconProcessor(
    reference_path="reference.fasta",
    min_base_quality=20,
    min_mapping_quality=30
)

# Process samples
results1 = processor.process_sample(
    fastq_r1="sample1_R1.fastq.gz",
    fastq_r2="sample1_R2.fastq.gz",
    output_dir="results/sample1",
    threads=4
)

results2 = processor.process_sample(
    fastq_r1="sample2_R1.fastq.gz",
    fastq_r2="sample2_R2.fastq.gz",
    output_dir="results/sample2",
    threads=4
)

# Perform comparative analysis
comparison_results = run_comparative_analysis(
    results1={"sample1": results1},
    results2={"sample2": results2},
    reference_seq="ATCG...",  # Reference sequence string
    output_path="comparison_results.csv",
    full_length_only=False
)

# Results are returned as pandas DataFrames
print(results1)  # Sample 1 haplotypes
print(comparison_results)  # Statistical comparison

Output Files

Single Sample Analysis

  • Sorted BAM file with alignments
  • CSV file containing haplotype information:
    • Sequence
    • Read count
    • Frequency
    • Number of mutations
    • Full-length status
  • Interactive HTML report (optional)
  • Console output with summary statistics

Comparative Analysis

  • CSV file with statistical comparisons:
    • Mutation positions and types
    • Frequencies in each sample
    • Statistical significance (p-values)
    • FDR-corrected p-values
  • Interactive HTML plot showing mutation frequency differences
  • Console output with significant mutations

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clonearmy-0.2.0.tar.gz (15.2 kB view details)

Uploaded Source

File details

Details for the file clonearmy-0.2.0.tar.gz.

File metadata

  • Download URL: clonearmy-0.2.0.tar.gz
  • Upload date:
  • Size: 15.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for clonearmy-0.2.0.tar.gz
Algorithm Hash digest
SHA256 220f37d88e07b90fb7bcd9277138a28fadcac89c06b86145ce9b6e6e40a26e27
MD5 923a9b6acdcaa97057300020e333e3ae
BLAKE2b-256 aa5a7e77eeecc894f2abd3aba243139ee31e056bb56f550b73a069ebdec0f2dc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page