Skip to main content

Antigen Receptor Classifier

Project description

ARC (Antigen Receptor Classifier)

Authors: Austin Crinklaw, Swapnil Mahajan

Requirements:

  • Linux OS
  • HMMER3
  • NCBI Blast+
  • Python 3+
    • Python packages: Pandas, BioPython

Installation:

We provide a Dockerfile for ease of use.

ARC can also be downloaded through PyPI using the following pip command.

pip install bio-arc

Testing Installation:

A quick check for proper dependencies and successful installation can be performed by navigating to your pip package install directory (which can be located by executing pip show bio-arc) and running the following command:

python3 -m arc_test

Passing all unit-tests means that your system is configured properly and ready to classify some protein sequences.

Usage:

Input

  • A fasta format file with one or more protein sequences.
>1WBZ_A_alpha I H2-Kb
MVPCTLLLLLAAALAPTQTRAGPHSLRYFVTAVSRPGLGEPRYMEVGYVDDTEFVRFDSDAENPRYEPRARWMEQEGPEYWERETQKAKGNEQSFRVDLRTLLGYYNQSKGGSHTIQVISGCEVGSDGRLLRGYQQYAYDGCDYIALNEDLKTWTAADMAALITKHKWEQAGEAERLRAYLEGTCVEWLRRYLKNGNATLLRTDSPKAHVTHHSRPEDKVTLRCWALGFYPADITLTWQLNGEELIQDMELVETRPAGDGTFQKWASVVVPLGKEQYYTCHVYHQGLPEPLTLRWEPPPSTVSNMATVAVLVVLGAAIVTGAVVAFVMKMRRRNTGGKGGDYALAPGSQTSDLSLPDCKVMVHDPHSLA
>1WBZ_B_b2m I H2-Kb
MARSVTLVFLVLVSLTGLYAIQKTPQIQVYSRHPPENGKPNILNCYVTQFHPPHIEIQMLKNGKKIPKVEMSDMSFSKDWSFYILAHTEFTPTETDTYACRVKHASMAEPKTVYWDRDM

Commands

  • Using Fasta file as an input:
python -m ARC classify -i /path/to/input.fasta -o /path/to/output.csv

Output

  • Output file has 4 columns in CSV format.
  • First column named 'ID' is the description provoded in the fasta for each sequence.
  • Second column named 'class' is the assigned molecule class for each sequence.
    • e.g. MHC-I, MHC-II, BCR or TCR.
  • The third column named 'chain_type' is the assigned chain type for each sequence.
    • e.g. alpha, beta, heavy, lambda, kappa, scFv, TscFv or construct. These will also be labelled as V for variable domain or C for constant domain.
  • The fourth column named 'calc_mhc_allele' is the MHC allele identified using groove domain similarity to MRO alleles.
ID class chain_type calc_mhc_allele
1WBY_A_alpha I H2-Db MHC-I alpha V
1WBY_B_b2m I H2-Db
1HQR_A_alpha II HLA-DRA01:01/DRB501:01 MHC-II alpha C HLA-DRA*01:01
1HQR_B_beta II HLA-DRA01:01/DRB501:01 MHC-II beta C HLA-DRB5*01:01
2CMR_H_heavy BCR heavy V
2CMR_L_light BCR kappa C
4RFO_L_light BCR lambda V
3UZE_A_heavy BCR scFv
1FYT_D_alpha TCR alpha V
1FYT_E_beta TCR beta C
3TF7_C_alpha TCR TscFv

How it works:

  • BCR and TCR chains are identified using HMMs. A given protein sequence is searched against HMMs built using BCR and TCR chain sequences from IMGT. HMMER is used to align an input sequence to the HMMs.
  • MHC class I (alpha1-alpha2 domains) and MHC class I alpha and beta chain HMMs are downloaded from Pfam website. An input protein sequence is searched against these HMMs. A HMMER bit score threshold of 25 was used to identify MHC chain sequences.
  • To identify MHC alleles, groove domains (G-domains) are assigned based on the MRO repository.
  • IgNAR sequences are identified through querying against a custom blast database.

References:

Several methods for HMMER result parsing were sourced from ANARCI.

Dunbar J and Deane CM. ANARCI: Antigen receptor numbering and receptor classification. Bioinformatics (2016)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bio-arc-0.1.1.tar.gz (14.2 MB view details)

Uploaded Source

Built Distribution

bio_arc-0.1.1-py3-none-any.whl (14.5 MB view details)

Uploaded Python 3

File details

Details for the file bio-arc-0.1.1.tar.gz.

File metadata

  • Download URL: bio-arc-0.1.1.tar.gz
  • Upload date:
  • Size: 14.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.5

File hashes

Hashes for bio-arc-0.1.1.tar.gz
Algorithm Hash digest
SHA256 9a8cb1407989e3567892c17f87693036733f5af7326914841e6d4dc1ef8089b4
MD5 50bd90b025ba3312e8be17a45147be66
BLAKE2b-256 3dd9fff4e0c14526e87dabe51e68ec9996d279ace08920f97b56c8b9735857a7

See more details on using hashes here.

File details

Details for the file bio_arc-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: bio_arc-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 14.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.5

File hashes

Hashes for bio_arc-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 262c1d0f86d4be9049625fcc79a9dffc4bc2c407281bba14420f0958ee5add44
MD5 1f35f6906e31532f0dd2e61f1061c343
BLAKE2b-256 6ba9947ab77440e5983832cf837b461042634075547d92d1b7a0c327930dbdad

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page