Analyze haplotypes from Illumina paired-end amplicon sequencing
Project description
CloneArmy
CloneArmy is a modern Python package for analyzing haplotypes from Illumina paired-end amplicon sequencing data. It provides a streamlined workflow for processing FASTQ files, aligning reads, identifying sequence variants, and performing comparative analyses between samples.
Features
- Fast paired-end read processing using BWA-MEM
- Quality-based filtering of bases and alignments
- Haplotype identification and frequency analysis
- Statistical comparison between samples with FDR correction
- Interactive visualization of mutation frequencies
- Rich command-line interface with progress tracking and tabular output
- Comprehensive HTML reports
- Multi-threading support
- Support for full-length sequence analysis
- Real-time progress monitoring with progress bars
- Exportable results in multiple formats (CSV, JSON, Excel)
Installation
pip install clonearmy
Requirements
- Python ≥ 3.8
- BWA (must be installed and available in PATH)
- Samtools (must be installed and available in PATH)
Usage
Command Line Interface
Basic Analysis
# Basic usage with progress tracking
clonearmy run /path/to/fastq/directory reference.fasta
# With all options
clonearmy run /path/to/fastq/directory reference.fasta \
--threads 8 \
--output results \
--min-base-quality 20 \
--min-mapping-quality 30 \
--format [csv|json|excel] \ # Output format selection
--no-report # Skip HTML report generation
Comparative Analysis
# Compare two samples
clonearmy compare \
/path/to/sample1/fastq \
/path/to/sample2/fastq \
reference.fasta \
--threads 8 \
--output comparison_results \
--min-base-quality 20 \
--min-mapping-quality 30 \
--format [csv|json|excel] \ # Output format selection
--full-length-only # Only consider full-length sequences
Output Examples
Sample Analysis Results
╒════════════════╤══════════╤════════════╤══════════════╕
│ Haplotype │ Count │ Frequency │ Mutations │
╞════════════════╪══════════╪════════════╪══════════════╡
│ ATCG... │ 1000 │ 0.45 │ 2 │
│ ATTG... │ 800 │ 0.36 │ 1 │
│ ATCC... │ 420 │ 0.19 │ 3 │
╘════════════════╧══════════╧════════════╧══════════════╛
Comparative Analysis Results
╒══════════╤════════════╤════════════╤═══════════╤═══════════╕
│ Position │ Sample 1 % │ Sample 2 % │ P-value │ FDR │
╞══════════╪════════════╪════════════╪═══════════╪═══════════╡
│ 123 A>T │ 45.2 │ 12.3 │ 0.001 │ 0.003 │
│ 456 G>C │ 33.1 │ 28.9 │ 0.042 │ 0.063 │
╘══════════╧════════════╧════════════╧═══════════╧═══════════╛
Python API
from pathlib import Path
from clone_army.processor import AmpliconProcessor
from clone_army.comparison import run_comparative_analysis
# Initialize processor with progress tracking
processor = AmpliconProcessor(
reference_path="reference.fasta",
min_base_quality=20,
min_mapping_quality=30,
show_progress=True # Enable progress bars
)
# Process samples
results1 = processor.process_sample(
fastq_r1="sample1_R1.fastq.gz",
fastq_r2="sample1_R2.fastq.gz",
output_dir="results/sample1",
threads=4,
output_format="csv" # or "json" or "excel"
)
results2 = processor.process_sample(
fastq_r1="sample2_R1.fastq.gz",
fastq_r2="sample2_R2.fastq.gz",
output_dir="results/sample2",
threads=4,
output_format="csv"
)
# Perform comparative analysis
comparison_results = run_comparative_analysis(
results1={"sample1": results1},
results2={"sample2": results2},
reference_seq="ATCG...", # Reference sequence string
output_path="comparison_results.csv",
full_length_only=False,
show_progress=True # Enable progress tracking
)
# Results are returned as pandas DataFrames
print(results1.to_markdown()) # Pretty print sample 1 haplotypes
print(comparison_results.to_markdown()) # Pretty print comparison
Output Files
Single Sample Analysis
- Sorted BAM file with alignments
- Results file in chosen format (CSV/JSON/Excel) containing:
- Sequence
- Read count
- Frequency
- Number of mutations
- Full-length status
- Quality metrics
- Interactive HTML report (optional)
- Console output with summary statistics and progress bars
Comparative Analysis
- Results file in chosen format with statistical comparisons:
- Mutation positions and types
- Frequencies in each sample
- Statistical significance (p-values)
- FDR-corrected p-values
- Effect sizes
- Interactive HTML plot showing mutation frequency differences
- Console output with significant mutations in tabular format
- Progress tracking for long-running operations
License
MIT License - See LICENSE file for details
Citation
If you use CloneArmy in your research, please cite: [Citation information to be added]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file clonearmy-0.2.3.tar.gz.
File metadata
- Download URL: clonearmy-0.2.3.tar.gz
- Upload date:
- Size: 20.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
05fad89e4213b4bac3ada14149c80ba4d6fc8c3d5fb073bbf8eab42140a596f3
|
|
| MD5 |
a64f5d51cc2be42a5116a759dc40ac17
|
|
| BLAKE2b-256 |
5c2c4097d5398b96e186c0a705657f840d558d28575ec21f6d8951580879ec1b
|
File details
Details for the file clonearmy-0.2.3-py3-none-any.whl.
File metadata
- Download URL: clonearmy-0.2.3-py3-none-any.whl
- Upload date:
- Size: 21.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
06bbb7551b27a2a35b2c1e14196db6e8f5dc30c5abe5036cac74de71153ab6af
|
|
| MD5 |
dcd4dd2593619c83537938be01cb8fca
|
|
| BLAKE2b-256 |
6cd51201d037e259852d97c3ce127c8c6ac7c72214bcab8d062ff364c81b3e74
|