Download SNP matrices from NCBI Pathogen Detection
Project description
Pathofetch
Lightweight CLI and Python tool for downloading SNP cluster data from NCBI Pathogen Detection (PD).
Installation
From PyPI
pip install pathofetch
From source
git clone https://github.com/erinyoung/pathofetch.git
cd pathofetch
pip install -e .
Usage
Pathofetch requires two arguments: an organism group and a SNP cluster ID (as defined by NCBI Pathogen Detection).
It will download the corresponding tarball for the SNP tree and generate a pairwise SNP distance matrix.
By default:
- The SNP matrix is saved to
{cluster_id}.snp_distance_matrix.csv. - The raw tarball is saved to
{cluster_id}.tar.gz.
Example
pathofetch -o Salmonella -c PDS000254123.2
Custom Output Filename
# specify CSV matrix
pathofetch -o Salmonella -c PDS000254123.2 -f results/matrix.csv
# generate qc file
pathofetch -o Salmonella -c PDS000254123.2 -q
Python API Usage
It is possible to import pathofetch into other Python scripts to build larger bioinformatics pipelines.
import pathofetch
# -----------------------------
# Download and process a single SNP cluster
# -----------------------------
# Returns True on success, False on failure
success = pathofetch.fetch_snp_matrix(
organism="Listeria_monocytogenes", # Organism group on NCBI PD
cluster_id="PDS000000123.5", # SNP cluster ID
out_file="results/matrix.csv", # Where to save the SNP distance matrix
tar_file="archives/raw_data.tar.gz", # Optional: save tarball to custom location
qc_file_arg="AUTO" # Generate QC file alongside output
)
if success:
print("Matrix generated successfully!")
print("Matrix CSV: results/matrix.csv")
print("Tarball: archives/raw_data.tar.gz")
print("QC stats: PDS000000123.5.qc.csv (auto-named)")
else:
print("Failed to generate SNP matrix. Check organism and cluster ID.")
Output Format
The output is a standard CSV symmetric matrix representing pairwise SNP distances.
-,PDT003080107.1,PDT002963418.1,PDT003087591.1
PDT003080107.1,0,12,5
PDT002963418.1,12,0,8
PDT003087591.1,5,8,0
QC Statistics File
If using the -q flag, a CSV file is generated with the following structure:
key,value
pathofetch_version,0.1.0
download_time,2.21s
organism_group,Aeromonas_salmonicida
cluster,PDS000097767.14
cluster_create_date,Feb 4 11:28
num_sample,87
snp_alignment_length,612
min_pairwise_distance,0
max_pairwise_distance,104
avg_pairwise_distance,50.46
tarball_file,PDS000097767.14.tar.gz
tarball_filesize,0.81 MB
snp_matrix_file,PDS000097767.14.snp_distance_matrix.csv
qc_file,PDS000097767.14.qc.csv
Arguments
usage: pathofetch [-h] [--version] [--organism ORGANISM] [--cluster CLUSTER] [--out-file OUT_FILE] [--tar-file TAR_FILE]
[--qc-file [QC_FILE]]
Download SNP Cluster Distance Matrices
options:
-h, --help show this help message and exit
--version, -v show program's version number and exit
--organism ORGANISM, -o ORGANISM
Organism Group (e.g. Salmonella)
--cluster CLUSTER, -c CLUSTER
Cluster ID (e.g. PDS000254123.2)
--out-file OUT_FILE, -f OUT_FILE
Path to save output CSV
--tar-file TAR_FILE, -t TAR_FILE
Path to save tar.gz
--qc-file [QC_FILE], -q [QC_FILE]
Save QC metrics to file
License
GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pathofetch-0.1.0.tar.gz.
File metadata
- Download URL: pathofetch-0.1.0.tar.gz
- Upload date:
- Size: 19.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
826a6ad31897c207acfb89b8302dc8361476269de2ff710d93e4f85c993837a4
|
|
| MD5 |
48005f54ae34705108f88f11f5391125
|
|
| BLAKE2b-256 |
23f6bd789dcc776696899ed0e90ca8a58b89b2bd264096d7110fdf72b14621d6
|
File details
Details for the file pathofetch-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pathofetch-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6283b747925595c6410c07bfd1d04a61fc092792a31eb93767e1e52e64dc5de1
|
|
| MD5 |
53d9081edbada57abf373dc389b9088d
|
|
| BLAKE2b-256 |
b61b7143db27b4564b3a2f263aa1a2acb093cfb95189c9a7b5caa2fb2ab86e68
|