Standardise TCR/MHC gene names to IMGT nomenclature.
Project description
tidytcells
DISCLAIMER: This package currently only supports parsing of human TCR and MHC gene data. Support for more species is planned for the future.
tidytcells
is a lightweight Python package written for bioinformaticians who
work with T cell receptor (TCR) data. The main purpose of the package is to
solve the problem of parsing and collating together non-standardised TCR
datasets. It is often difficult to compile TCR data from multiple sources
because the formats/nomenclature of how each dataset encodes TCR and MHC gene
names are slightly different, or even inconsistent within themselves.
tidytcells
attempts to ameliorate this issue by providing simple functions
that can, among other things, standardise TCR and MHC gene names to be
IMGT-compliant.
The package is currently in its alpha stage in development. More thorough documentation will follow soon.
Table of contents
Installation
Installation method | Comments |
---|---|
PyPI (recommended) | pip install tidytcells |
source | Download the source code and run pip install . from the project root directory |
API reference
tidytcells
currently comes with two submodules that can directly be accessed
from the parent module: tidytcells.mhc
and tidytcells.tcr
.
.mhc
submodule
.standardise(gene_name: str, species: str)
Attempt to standardise gene_name
for species
to be IMGT-compliant.
Parameters | Type | Value |
---|---|---|
gene_name |
str |
Potentially non-standard name for MHC gene |
species |
str |
Species to which the MHC gene belongs (see below) |
Return type | Value |
---|---|
Tuple[Union[str, None], Union[str, None]] |
If the specified species is supported, and gene_name could be standardised, then return a tuple containing the standardised gene name decomposed into two parts: 1) the name of the gene specific to the level of the protein, and 2) (if any) further valid specifier fields. If species is unsupported, then return a tuple with the gene_name as is for the first element, and None for the second element. Else return the tuple (None, None) . See example usage. |
.get_chain(gene_name: str)
NOTE: This function currently only supports HLA gene names.
Given an IMGT-compliant MHC gene name gene_name
, detect whether it codes for
an alpha chain or a beta chain.
Parameters | Type | Value |
---|---|---|
gene_name |
str |
IMGT-compliant MHC gene name |
Return type | Value |
---|---|
Union[str, None] |
'alpha' or 'beta' if gene_name recognised, else None |
.classify(gene_name: str)
NOTE: This function currently only supports HLA gene names.
Given an IMGT-compliant MHC gene name gene_name
, detect whether it comprises
a class I or II MHC receptor.
Parameters | Type | Value |
---|---|---|
gene_name |
str |
IMGT-compliant MHC gene name |
Return type | Value |
---|---|
Union[int, None] |
1 or 2 if gene_name recognised, else None |
.tcr
submodule
.standardise(gene_name: str, species: str)
Attempt to standardise gene_name
for species
to be IMGT-compliant.
Parameters | Type | Value |
---|---|---|
gene_name |
str |
Potentially non-standard name for TCR gene |
species |
str |
Species to which the TCR gene belongs (see below) |
Return type | Value |
---|---|
Union[str, None] |
If the specified species is supported, and gene_name could be standardised, then return the standardised gene name. If species is unsupported, then return gene_name as is. Else return None . |
Species names
For all functions that expect a species to be specified via a string, the
species should be referred to by its binomial name (genus followed by species),
CamelCased, with no space between the two parts (e.g. 'HomoSapiens'
).
Example usage
import tidytcells
# --- MHC parsing ---
tidytcells.mhc.standardise('HLA-A', 'HomoSapiens')
# > ('HLA-A', None)
tidytcells.mhc.standardise('B07', 'HomoSapiens')
# > ('HLA-B*07', None)
tidytcells.mhc.standardise('DRA*01:01:01', 'HomoSapiens')
# > ('HLA-DRA*01:01', ':01')
tidycells.mhc.get_chain('HLA-A')
# > 'alpha'
tidycells.mhc.classify('HLA-DRB1*01:01')
# > 2
# --- TCR parsing ---
tidycells.tcr.standardise('TCRAV32S1', 'HomoSapiens')
# > 'TRAV25'
Contributions
Please feel free to contribute by submitting bug reports and pull requests.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for tidytcells-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fb979153d136f8d2b5b9a26133297e7ca5093d55b6072b725a1eafb59f620bd4 |
|
MD5 | b6872dfce7364e453bd655327e24e1f3 |
|
BLAKE2b-256 | 53eae122bc48eba1d19ad88250feb65a2fa8c15b9d213be9b5a72da2f35c292f |