Skip to main content

Utility to compare star alleles

Project description

Star Alleles Comparator (star_allele_comp)

The comparator can compare HLA or KIR alleles between cohorts

Install

pip install git+https://github.com/linnil1/star_alleles_comparator

Usage

1. Using command line

star_allele_comp hla_result1.csv hla_result2.csv --family hla --save tmp --plot -v

The results will output to screen and save in .txt and .csv format.

The example output is the same as below (see next section).

The input CSV should adhere to the following format:

Columns

  • id (required): The sample ID.
  • method (optinal): The method. If not specified, filename will be used.
  • allele* (required) Columns starting with allele will be used to store the allele for each id/sample with corresponding method. The value can be NULL, empty.

Format 1: Separate Columns for Alleles

For this format, each allele is represented in separate columns:

method,id,allele1,allele2,allele3,allele4
method1,id1,"A*01:02:03:04","A*01:02","B*01:01:01:01"
method1,id2,"A*01:02:03:04","A*01:02","B*01:01:01:01"
method1,id2,"C*03","C*03:03"
method2,id1,"A*01:02:03:04","A*01:02:03","B*01:02:02:01","B*04:01:02"
method2,id2,"A*01:02:03:04","A*01:02:03","B*01:02:02:01","B*04:01:02"
method2,id2,"C*03:03", "C*03:02"

Format 2: Using "alleles" Column with Underscore as Separator

In this format, the alleles column contains a single string with alleles separated by underscores:

method,id,alleles
method1,id3,"KIR2DL1*0010203_KIR2DL1*001_KIR2DS1*0010101"
method1,id4,"KIR2DL1*0010203_KIR2DL1*00102_KIR2DS1*00101"
method1,id3,"KIR2DL1*03105_KIR2DL1*03:03"
method2,id3,"KIR2DL1*001_KIR2DL1*0030203_KIR2DS1*0010208_KIR2DS1*0040102"
method2,id4,"KIR2DL1*0010203_KIR2DL1*0010203_KIR2DS1*0010202_KIR2DS1*0040302"
method2,id4,"KIR2DL1*00303_KIR2DL1*03002"

2. Using Python functions

Run comparison

from star_allele_comp import compare_method, print_all_summary, plot_summary
cohort = {
    "method1": { "sample_id1": [ "A*01:02:03:04", "A*01:02", "B*01:01:01:01", "B*03:01"] },
    "method2": { "sample_id1": [ "A*01:02:03:04", "A*01:02:03", "B*01:02:02:01", "B*04:01:02"] },
}
ground_truth_method = "method1"
result = compare_method(cohort, ground_truth_method, "hla")

Print result allele by allele

print(result)

# Method method2
# Sample sample_id1
# A*01:02:03:04    =4= A*01:02:03:04
# A*01:02          =2= A*01:02:03
# B*01:01:01:01    =1= B*01:02:02:01
# B*03:01          =0= B*04:01:02
# Note:
# Left hand side is the alleles in reference method/cohort
# Right hand side is the allele in another method/cohort

Print summary (i.e. Accuracy vs Resolution, Confusion Matrix)

# details are in star_allele_comp/summary.py:print_all_summary
df_cohort = result.to_dataframe()
print_all_summary(df_cohort)
Accuracy summary
           Accuracy                                num_match                     num_ref
Resolution        0     1    2    3    4   FP   FN         0  1  2  3  4  FP  FN       0  1  2  3  4  FP  FN
method
method1         1.0  1.00  1.0  1.0  1.0  0.0  0.0         4  4  4  2  2   0   0       4  4  4  2  2   0   0
method2         1.0  0.75  0.5  0.5  0.5  0.0  0.0         4  3  2  1  1   0   0       4  4  4  2  2   0   0

# Note In the accuracy summary table:
# * num_match represents the number of alleles that match the alleles in the ground truth method under the specific `Resolution`.
# * num_ref indicates the number of reference alleles with resolution >= `Resolution`
# * Accuracy is calculated as the ratio of num_match to num_ref.
# * Accuracy in FP is False Discovery Rate (FDR)
# * Accuracy in FN is False Negative Rate (FNR)


Confusion matrix (not the same sample)
            Count
 match_res      -1  0  1  2  3
 ref_res
-1              2  0  0  0  0
 1              1  1  0  0  0
 2              1  0  2  6  0
 3              0  0  0  0  1
 4              0  0  0  0  1

 # Note
 # -1 indicates FP or FN


Accuracy summary per resolution per gene
             Accuracy                               num_match                     num_ref
Resolution          0    1    2    3    4   FP   FN         0  1  2  3  4  FP  FN       0  1  2  3  4  FP  FN
method  gene
method1 A         1.0  1.0  1.0  1.0  1.0  0.0  0.0         2  2  2  1  1   0   0       2  2  2  1  1   0   0
        B         1.0  1.0  1.0  1.0  1.0  0.0  0.0         2  2  2  1  1   0   0       2  2  2  1  1   0   0

Plot summary (i.e. Accuracy vs Resolution, gene, methods)

figs = plot_summary(df_cohort)
# You can use Dash to show it
from dash import dcc, html, Dash
app = Dash(__name__)
app.layout = html.Div(children=[dcc.Graph(figure=fig) for fig in figs])
app.run(debug=True)

example_resolution_accuracy_figure

Develop

pip install pdoc
pdoc star_allele_comp --docformat google

Details

allele summary

Project details


Release history Release notifications | RSS feed

This version

0.2

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

star_allele_comp-0.2.tar.gz (15.9 kB view details)

Uploaded Source

Built Distribution

star_allele_comp-0.2-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file star_allele_comp-0.2.tar.gz.

File metadata

  • Download URL: star_allele_comp-0.2.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for star_allele_comp-0.2.tar.gz
Algorithm Hash digest
SHA256 2e9e95d69f2f6e60e17548258d3503aa5ed66d7f855763d60380bb000bdc619d
MD5 0a03c05567a40f3c8815292dd6b0e432
BLAKE2b-256 99c4dd100bfb49e23877cdc89e4518e2baba69e290ce8d79928f59c3c090036c

See more details on using hashes here.

File details

Details for the file star_allele_comp-0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for star_allele_comp-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 90d112007e18acc4d3b5358092739afc338465f2585008da39e4cffc0211d4a7
MD5 2a8bb12f271f37fffc0c2ff7278fd4e8
BLAKE2b-256 c93dba211d03a79223955f3013fd06bb159e217eaa592589cd57ed211262430d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page