Skip to main content

Download SNP matrices from NCBI Pathogen Detection

Project description

Pathofetch

Lightweight CLI and Python tool for downloading SNP cluster data from NCBI Pathogen Detection (PD).

Installation

From PyPI

pip install pathofetch

From source

git clone https://github.com/erinyoung/pathofetch.git
cd pathofetch
pip install -e .

Usage

Pathofetch requires two arguments: an organism group and a SNP cluster ID (as defined by NCBI Pathogen Detection).
It will download the corresponding tarball for the SNP tree and generate a pairwise SNP distance matrix.

By default:

  • The SNP matrix is saved to {cluster_id}.snp_distance_matrix.csv.
  • The raw tarball is saved to {cluster_id}.tar.gz.

Example

pathofetch -o Salmonella -c PDS000254123.2

Custom Output Filename

# specify CSV matrix
pathofetch -o Salmonella -c PDS000254123.2 -f results/matrix.csv
# generate qc file
pathofetch -o Salmonella -c PDS000254123.2 -q

Python API Usage

It is possible to import pathofetch into other Python scripts to build larger bioinformatics pipelines.

import pathofetch

# -----------------------------
# Download and process a single SNP cluster
# -----------------------------
# Returns True on success, False on failure
success = pathofetch.fetch_snp_matrix(
    organism="Listeria_monocytogenes",    # Organism group on NCBI PD
    cluster_id="PDS000000123.5",          # SNP cluster ID
    out_file="results/matrix.csv",        # Where to save the SNP distance matrix
    tar_file="archives/raw_data.tar.gz",  # Optional: save tarball to custom location
    qc_file_arg="AUTO"                     # Generate QC file alongside output
)

if success:
    print("Matrix generated successfully!")
    print("Matrix CSV: results/matrix.csv")
    print("Tarball: archives/raw_data.tar.gz")
    print("QC stats: PDS000000123.5.qc.csv (auto-named)")
else:
    print("Failed to generate SNP matrix. Check organism and cluster ID.")

Output Format

The output is a standard CSV symmetric matrix representing pairwise SNP distances.

-,PDT003080107.1,PDT002963418.1,PDT003087591.1
PDT003080107.1,0,12,5
PDT002963418.1,12,0,8
PDT003087591.1,5,8,0

QC Statistics File

If using the -q flag, a CSV file is generated with the following structure:

key,value
pathofetch_version,0.1.0
download_time,2.21s
organism_group,Aeromonas_salmonicida
cluster,PDS000097767.14
cluster_create_date,Feb 4 11:28
num_sample,87
snp_alignment_length,612
min_pairwise_distance,0
max_pairwise_distance,104
avg_pairwise_distance,50.46
tarball_file,PDS000097767.14.tar.gz
tarball_filesize,0.81 MB
snp_matrix_file,PDS000097767.14.snp_distance_matrix.csv
qc_file,PDS000097767.14.qc.csv

Arguments

usage: pathofetch [-h] [--version] [--organism ORGANISM] [--cluster CLUSTER] [--out-file OUT_FILE] [--tar-file TAR_FILE]
                  [--qc-file [QC_FILE]]

Download SNP Cluster Distance Matrices

options:
  -h, --help            show this help message and exit
  --version, -v         show program's version number and exit
  --organism ORGANISM, -o ORGANISM
                        Organism Group (e.g. Salmonella)
  --cluster CLUSTER, -c CLUSTER
                        Cluster ID (e.g. PDS000254123.2)
  --out-file OUT_FILE, -f OUT_FILE
                        Path to save output CSV
  --tar-file TAR_FILE, -t TAR_FILE
                        Path to save tar.gz
  --qc-file [QC_FILE], -q [QC_FILE]
                        Save QC metrics to file

License

GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pathofetch-0.1.0.tar.gz (19.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pathofetch-0.1.0-py3-none-any.whl (19.6 kB view details)

Uploaded Python 3

File details

Details for the file pathofetch-0.1.0.tar.gz.

File metadata

  • Download URL: pathofetch-0.1.0.tar.gz
  • Upload date:
  • Size: 19.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for pathofetch-0.1.0.tar.gz
Algorithm Hash digest
SHA256 826a6ad31897c207acfb89b8302dc8361476269de2ff710d93e4f85c993837a4
MD5 48005f54ae34705108f88f11f5391125
BLAKE2b-256 23f6bd789dcc776696899ed0e90ca8a58b89b2bd264096d7110fdf72b14621d6

See more details on using hashes here.

File details

Details for the file pathofetch-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pathofetch-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for pathofetch-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6283b747925595c6410c07bfd1d04a61fc092792a31eb93767e1e52e64dc5de1
MD5 53d9081edbada57abf373dc389b9088d
BLAKE2b-256 b61b7143db27b4564b3a2f263aa1a2acb093cfb95189c9a7b5caa2fb2ab86e68

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page