Analyze haplotypes from Illumina paired-end amplicon sequencing
Project description
CloneArmy
CloneArmy is a modern Python package for analyzing haplotypes from Illumina paired-end amplicon sequencing data. It provides a streamlined workflow for processing FASTQ files, aligning reads, identifying sequence variants, and performing comparative analyses between samples.
Features
- Fast paired-end read processing using BWA-MEM
- Quality-based filtering of bases and alignments
- Haplotype identification and frequency analysis
- Statistical comparison between samples with FDR correction
- Interactive visualization of mutation frequencies
- Rich command-line interface with progress tracking
- Comprehensive HTML reports
- Multi-threading support
- Support for full-length sequence analysis
Installation
pip install cloneArmy
Requirements
- Python ≥ 3.8
- BWA (must be installed and available in PATH)
- Samtools (must be installed and available in PATH)
Usage
Command Line Interface
Basic Analysis
# Basic usage
cloneArmy run /path/to/fastq/directory reference.fasta
# With all options
cloneArmy run /path/to/fastq/directory reference.fasta \
--threads 8 \
--output results \
--min-base-quality 20 \
--min-mapping-quality 30 \
--no-report # Skip HTML report generation
Comparative Analysis
# Compare two samples
cloneArmy compare \
/path/to/sample1/fastq \
/path/to/sample2/fastq \
reference.fasta \
--threads 8 \
--output comparison_results \
--min-base-quality 20 \
--min-mapping-quality 30 \
--full-length-only # Only consider full-length sequences
Python API
from pathlib import Path
from clone_army.processor import AmpliconProcessor
from clone_army.comparison import run_comparative_analysis
# Initialize processor
processor = AmpliconProcessor(
reference_path="reference.fasta",
min_base_quality=20,
min_mapping_quality=30
)
# Process samples
results1 = processor.process_sample(
fastq_r1="sample1_R1.fastq.gz",
fastq_r2="sample1_R2.fastq.gz",
output_dir="results/sample1",
threads=4
)
results2 = processor.process_sample(
fastq_r1="sample2_R1.fastq.gz",
fastq_r2="sample2_R2.fastq.gz",
output_dir="results/sample2",
threads=4
)
# Perform comparative analysis
comparison_results = run_comparative_analysis(
results1={"sample1": results1},
results2={"sample2": results2},
reference_seq="ATCG...", # Reference sequence string
output_path="comparison_results.csv",
full_length_only=False
)
# Results are returned as pandas DataFrames
print(results1) # Sample 1 haplotypes
print(comparison_results) # Statistical comparison
Output Files
Single Sample Analysis
- Sorted BAM file with alignments
- CSV file containing haplotype information:
- Sequence
- Read count
- Frequency
- Number of mutations
- Full-length status
- Interactive HTML report (optional)
- Console output with summary statistics
Comparative Analysis
- CSV file with statistical comparisons:
- Mutation positions and types
- Frequencies in each sample
- Statistical significance (p-values)
- FDR-corrected p-values
- Interactive HTML plot showing mutation frequency differences
- Console output with significant mutations
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
clonearmy-0.2.0.tar.gz
(15.2 kB
view details)
File details
Details for the file clonearmy-0.2.0.tar.gz.
File metadata
- Download URL: clonearmy-0.2.0.tar.gz
- Upload date:
- Size: 15.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
220f37d88e07b90fb7bcd9277138a28fadcac89c06b86145ce9b6e6e40a26e27
|
|
| MD5 |
923a9b6acdcaa97057300020e333e3ae
|
|
| BLAKE2b-256 |
aa5a7e77eeecc894f2abd3aba243139ee31e056bb56f550b73a069ebdec0f2dc
|