sma-finder

A tool for diagnosing spinal muscular atrophy (SMA) using exome or genome sequencing data

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

SMA Finder

For diagnosing spinal muscular atrophy (SMA) using exome or genome sequencing data. The tool takes 1 or more alignment files (CRAM or BAM) and reports if the sample(s) likely have SMA or not. It does not report carrier status or SMN2 copy number.

Install

python3 -m pip install sma-finder

Usage

Example command and output:

/$ sma_finder --verbose --genome-version 38 --reference-fasta /ref/hg38.fa  sample1.cram

Input args:
    --reference-fasta: /ref/hg38.fa
    --genome-version: 38
    --output-tsv: sample1.sma_finder_results.tsv
    CRAMS or BAMS: sample1.cram
----
Output row #1:
        filename_prefix                     sample1
        file_type                           cram
        sample_id                           s1
        sma_status                          has SMA
        confidence_score                    168
        c840_reads_with_smn1_base_C         0
        c840_total_reads                    174
Wrote 1 rows to sample1.sma_finder_results.tsv

Full usage help text:


/$ sma_finder --help

usage: sma_finder [-h] -R REFERENCE_FASTA -g {37,38,T2T} [-o OUTPUT_TSV]
                     [-v]
                     cram_or_bam_path [cram_or_bam_path ...]

positional arguments:
  cram_or_bam_path      One or more CRAM or BAM file paths

optional arguments:
  -h, --help            show this help message and exit
  -R REFERENCE_FASTA, --reference-fasta REFERENCE_FASTA
                        Reference genome FASTA file path
  -g {37,38,T2T}, --genome-version {37,38,T2T}
                        Reference genome version
  -o OUTPUT_TSV, --output-tsv OUTPUT_TSV
                        Optional output tsv file path
  -v, --verbose         Whether to print extra details during the run

Output Columns

The output .tsv contains the following columns:

filename_prefix          =  the CRAM or BAM filename prefix 
file_type                =  "cram" or "bam"
sample_id                =  sample id from the CRAM or BAM file header (parsed from the read group metadata)  
sma_status               =  possible values are:   "has SMA",  "does not have SMA",  or "not enough coverage at SMN c.840 position"
confidence_score         =  PHRED-scaled integer confidence score that the sma_status is correct. This is similar to the PL field in GATK HaplotypeCaller genotypes.
c840_reads_with_smn1_base_C     = number of reads that have a 'C' at the c.840 position in SMN1 or SMN2  
c840_total_reads    = total number of reads overlapping the c.840 position in SMN1 plus SMN2

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.4.4

Jan 31, 2024

1.4.3

Jan 29, 2024

1.4.2

Jan 16, 2024

1.4.1

Jan 16, 2024

1.4

Jan 16, 2024

1.3

Mar 14, 2023

1.2

Feb 5, 2023

This version

1.1

Oct 30, 2022

1.0.2

Oct 17, 2022

1.0.1

Oct 17, 2022

1.0

Sep 20, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sma_finder-1.1.tar.gz (6.8 kB view hashes)

Uploaded Oct 30, 2022 Source

Built Distribution

sma_finder-1.1-py3-none-any.whl (7.9 kB view hashes)

Uploaded Oct 30, 2022 Python 3

Hashes for sma_finder-1.1.tar.gz

Hashes for sma_finder-1.1.tar.gz
Algorithm	Hash digest
SHA256	`dc9b3f329417f6bbd641e2d2395472216cedd32767f5b76e280fff5906f01227`
MD5	`ac0eb40a354345b64ef8c027a85f0215`
BLAKE2b-256	`17b443e44ee1c372c3b0b391798a05bf4717fbf2429e796e13e6dad3ee6732cb`

Hashes for sma_finder-1.1-py3-none-any.whl

Hashes for sma_finder-1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3eb585ce1026fb977103e420978dce00e6ea1807758c0cafda3d1692603f751d`
MD5	`86135d218fc30ae0ad26c2ff1f67991a`
BLAKE2b-256	`e426efec4cff59b3e8511ecdb5e5ff1a46973c0bb280d1549d587d10215cf78d`