ARD reduction for HLA with Python
Project description
py-ard
Swiss army knife of HLA Nomenclature
Note:
- With
py-ard>=2.0.0, the dependency on Pandas library has been removed.
py-ard is ARD reduction for HLA in Python
Human leukocyte antigen (HLA) genes encode cell surface proteins that are important for immune regulation. Exons
encoding the Antigen Recognition Domain (ARD) are the most polymorphic region of HLA genes and are important for
donor/recipient HLA matching.
The history of allele typing methods has played a major role in determining resolution and ambiguity of reported HLA
values. Although
HLA nomenclature has not
always conformed to the same standard, it is now defined
by The WHO Nomenclature Committee for Factors of the HLA System. py-ard
is aware of the variation in historical resolutions and grouping and is able to translate from one representation to
another based on alleles published quarterly by IPD/IMGT-HLA.
Table of Contents
Installation
py-ard works with Python 3.9 and higher (Python 3.8-3.13 are supported, but 3.9+ is recommended).
Install from PyPi
pip install py-ard
Install With Homebrew
On macOS, py-ard can be installed using Homebrew package manager.
This is very handy for using the command line versions of the tool without having to create virtual environments.
First time, you'd need to tap the nmdp-bioinformatics tap.
brew tap nmdp-bioinformatics/tap
Install py-ard
brew install py-ard
Homebrew will notify you as new versions of py-ard are released.
Install from source
Checkout the py-ard source code.
git clone https://github.com/nmdp-bioinformatics/py-ard.git
cd py-ard
Create and activate virtual environment. Install the py-ard dependencies.
make venv
source venv/bin/activate
make install
See Our Contribution Guide for open source contribution to py-ard.
Using py-ard
Using py-ard from Python code
py-ard can be used in a program to reduce/expand HLA GL String representation. If py-ard discovers an invalid Allele,
it'll throw an Invalid Exception, not silently return an empty result.
Initialize py-ard
Import and initialize pyard package.
The default initialization is to use the latest version of IPD-IMGT/HLA database.
import pyard
ard = pyard.init()
Initialize py-ard with a particular version of IPD/IMGT-HLA database.
import pyard
ard = pyard.init('3510')
When processing a large numbers of typings, it's helpful to have a cache of previously calculated reductions to make
similar typings reduce faster. The cache size of pre-computed reductions can be changed from the default of 1,000 by
setting cache_size argument. This increases the memory footprint but will significantly increase the processing times
for large number of reductions.
import pyard
max_cache_size = 1_000_000
ard = pyard.init('3510', cache_size=max_cache_size)
By default, the IPD-IMGT/HLA data is stored locally in $TMPDIR/pyard-$USER/. This temporary location may be removed when your computer restarts.
Alternatively, you can specify a different, more permanent directory for the cached data.
import pyard
ard = pyard.init('3510', data_dir='~/.py-ard/')
# Creating ~/.py-ard/pyard-3510.sqlite3 as cache.
# Version: 3510
As MAC data changes frequently, you can choose to refresh the MAC code for current IPD/IMGT-HLA database version.
ard.refresh_mac_codes()
You can check the current version of IPD-IMGT/HLA database.
ard.get_db_version()
You can choose to skip loading MAC codes if not needed (improves initialization time) by specifying load_mac=False during initialization.
import pyard
ard = pyard.init('3510', load_mac=False)
Configure Reduction Behavior
Customize reduction behavior by passing a config dictionary to pyard.init().
import pyard
config = {
'reduce_serology': True, # Reduce serology typings (default: True)
'reduce_v2': True, # Reduce V2 alleles (default: True)
'reduce_3field': True, # Reduce 3-field alleles (default: True)
'reduce_P': True, # Reduce P group alleles (default: True)
'reduce_XX': True, # Reduce XX codes (default: True)
'reduce_MAC': True, # Reduce MAC codes (default: True)
'reduce_shortnull': True, # Reduce short nulls (default: True)
'ping': True, # Use ping mode (default: True)
'verbose_log': False, # Enable verbose logging (default: False)
'ARS_as_lg': False, # Treat ARS as lg (default: False)
'strict': True, # Strict validation mode (default: True)
'ignore_allele_with_suffixes': () # Tuple of suffixes to ignore (default: ())
}
ard = pyard.init('3510', config=config)
Reduce Typings
Note: The redux method in ARD object handles both GL Strings and individual alleles.
Reduce a single locus HLA Typing by specifying the allele/MAC/XX code and the reduction method to redux.
allele = "A*01:01:01"
ard.redux(allele, 'G')
# >>> 'A*01:01:01G'
ard.redux(allele, 'lg')
# >>> 'A*01:01g'
ard.redux(allele, 'lgx')
# >>> 'A*01:01'
Reduce an ambiguous GL String
# Reduce GL String
#
ard.redux("A*01:01/A*01:01N+A*02:AB^B*07:02+B*07:AB", "G")
# 'B*07:02:01G+B*07:02:01G^A*01:01:01G+A*02:01:01G/A*02:02'
You can also reduce serology based typings.
ard.redux('B14', 'lg')
# >>> 'B*14:01g/B*14:02g/B*14:03g/B*14:04g/B*14:05g/B*14:06g/B*14:08g/B*14:09g/B*14:10g/B*14:11g/B*14:12g/B*14:13g/B*14:14g/B*14:15g/B*14:16g/B*14:17g/B*14:18g/B*14:19g/B*14:20g/B*14:21g/B*14:22g/B*14:23g/B*14:24g/B*14:25g/B*14:26g/B*14:27g/B*14:28g/B*14:29g/B*14:30g/B*14:31g/B*14:32g/B*14:33g/B*14:34g/B*14:35g/B*14:36g/B*14:37g/B*14:38g/B*14:39g/B*14:40g/B*14:42g/B*14:43g/B*14:44g/B*14:45g/B*14:46g/B*14:47g/B*14:48g/B*14:49g/B*14:50g/B*14:51g/B*14:52g/B*14:53g/B*14:54g/B*14:55g/B*14:56g/B*14:57g/B*14:58g/B*14:59g/B*14:60g/B*14:62g/B*14:63g/B*14:65g/B*14:66g/B*14:68g/B*14:70Qg/B*14:71g/B*14:73g/B*14:74g/B*14:75g/B*14:77g/B*14:82g/B*14:83g/B*14:86g/B*14:87g/B*14:88g/B*14:90g/B*14:93g/B*14:94g/B*14:95g/B*14:96g/B*14:97g/B*14:99g/B*14:102g'
Valid Reduction Types
| Reduction Type | Description |
|---|---|
G |
Reduce to G Group Level |
P |
Reduce to P Group Level |
lg |
Reduce to 2 field ARD level (append g) |
lgx |
Reduce to 2 field ARD level |
W |
Reduce/Expand to full field(4,3,2) WHO nomenclature level |
exon |
Reduce/Expand to 3 field level |
U2 |
Reduce to 2 field unambiguous level |
S |
Reduce to Serological level |
Perform DRB1 blending with DRB3, DRB4 and DRB5
import pyard
pyard.dr_blender(drb1='HLA-DRB1*03:01+DRB1*04:01', drb3='DRB3*01:01', drb4='DRB4*01:03')
# >>> 'DRB3*01:01+DRB4*01:03'
MAC Codes
py-ard supports not only reducing to various types but helps in expanding and
looking up MAC representation. See MAC Service UI for detail.
Expand MAC
You can also use py-ard to expand MAC codes. Use expand_mac method on ard.
ard.expand_mac('HLA-A*01:BC')
# 'HLA-A*01:02/HLA-A*01:03'
Lookup MAC
Find the corresponding MAC code for an allele list GL String.
ard.lookup_mac('A*01:02/A*01:01/A*01:03')
# A*01:MN
CWD (Version 2) Reduction
Reduce a MAC code or an allele list GL String to CWD reduced list.
ard.cwd_redux("B*15:01:01/B*15:01:03/B*15:04/B*15:07/B*15:26N/B*15:27")
# => B*15:01/B*15:07
The above 2 methods can be chained to get back a MAC code that has a CWD reduced version.
ard.lookup_mac(ard.cwd_redux("B*15:01:01/B*15:01:03/B*15:04/B*15:07/B*15:26N/B*15:27"))
# 'B*15:AH'
Additional Methods
Validate a GL String:
ard.validate('A*01:01+A*02:01^B*07:02+B*08:01')
# Returns True if valid, raises exception if invalid
Expand XX codes:
ard.expand_xx('A*01:XX')
# Returns all alleles matching the XX code
Find similar alleles:
ard.similar_alleles('A*01:AB')
# Returns list of similar allele names
Check allele types:
ard.is_mac('A*01:AB') # Check if MAC code
ard.is_serology('A1') # Check if serology
ard.is_v2('A*0101') # Check if V2 allele
ard.is_XX('A*01:XX') # Check if XX code
ard.is_shortnull('A*01:01N') # Check if short null
ard.is_null('A*01:01N') # Check if null allele
Find serology relationships:
ard.find_broad_splits('A10') # Find broad/split relationships
ard.find_associated_antigen('Bw4') # Find associated antigens
Convert V2 to V3:
ard.v2_to_v3('A*0101') # Convert V2 allele to V3 format
Using py-ard from R code
py-ard works well from R as well. Please
see Using py-ard from R language
page for detailed walkthrough.
Command Line Tools
Various command line interface (CLI) tools are available to use for managing local IPD-IMGT/HLA cache database, running impromptu reduction queries and batch processing of CSV files.
For all tools, use --imgt-version and --data-dir to specify the IPD-IMGT/HLA database version and the directory
where the SQLite files are created.
pyard-import Import the latest IPD-IMGT/HLA database
pyard-import helps with importing and reinstalling of prepared IPD-IMGT/HLA and MAC data.
Use pyard-import -h to see all the options available.
$ pyard-import -h
usage: pyard-import [-h] [--list] [-i IPD_VERSION] [-d DATA_DIR]
[--v2-to-v3-mapping V2_V3_MAPPING] [--refresh-mac]
[--re-install] [--skip-mac]
py-ard tool to generate reference SQLite database. Allows updating db with
custom V2 to V3 mappings. Displays the list of available IPD/IMGT-HLA database
versions.
options:
-h, --help show this help message and exit
--list Show Versions of available IPD/IMGT-HLA Databases
-i, --ipd-version IPD_VERSION
Import supplied IPD/IMGT-HLA DB Version
-d, --data-dir DATA_DIR
Data directory to store imported data
--v2-to-v3-mapping V2_V3_MAPPING
V2 to V3 mapping CSV file
--refresh-mac Only refresh MAC data
--re-install reinstall a fresh version of database
--skip-mac Skip creating MAC mapping
Run pyard-import without any option to download and prepare the latest version of IPD-IMGT/HLA and MAC data.
$ pyard-import
Created Latest py-ard database
Import particular version of IPD/IMGT-HLA database
$ pyard-import --db-version 3.29.0
Created py-ard version 3290 database
Import particular version of IPD/IMGT-HLA database and replace the v2 to v3 mapping table from a CSV file.
$ pyard-import --imgt-version 3.29.0 --v2-to-v3-mapping map2to3.csv
Created py-ard version 3290 database
Updated v2_mapping table with 'map2to3.csv' mapping file.
Reinstall a particular IPD/IMGT-HLA database
pyard-import --imgt-version 3340 --re-install
Replace the Latest IPD/IMGT-HLA database with V2 mappings
$ pyard-import --v2-to-v3-mapping map2to3.csv
Refresh the MAC for the specified version
$ pyard-import --imgt-version 3450 --refresh-mac
Skip MAC loading
You can skip loading MAC if you don't need by using --skip-mac
$ pyard-import --imgt-version 3150 --skip-mac
pyard-status Show database status
Show the statuses of all py-ard databases
pyard-status goes through all the available databases and checks all the tables that should be available. This is very
helpful to show all the databases, number of rows in each table, any missing tables and the stored IPD-IMGT/HLA version.
$ pyard-status
Use --data-dir to specify an alternate directory for cached database files.
$ pyard-status --data-dir ~/.pyard/
=============================================
IPD/IMGT-HLA DB Version: Latest (3530)
There is a newer IPD/IMGT-HLA release than version 3530
Upgrade to latest version '3630' with 'pyard-import --re-install'
File: /Users/pbashyal-nmdp/.pyard/pyard-Latest.sqlite3
Size: 577.42MB
---------------------------------------------
|Table Name | Rows|
|-------------------------------------------|
|alleles | 39,977|
|cwd2 | 336|
|dup_g | 70|
|exon_group | 13,406|
|exp_alleles | 91|
|g_group | 14,736|
|lgx_group | 14,736|
|mac_codes | 1,138,229|
|p_group | 21,534|
|p_not_g | 1,709|
|serology_broad_split_mapping | 23|
|serology_mapping | 131|
|shortnulls | 176|
|v2_mapping | 11|
|who_alleles | 37,619|
|who_group | 36,576|
|xx_codes | 2,019|
---------------------------------------------
pyard Redux quickly
pyard command can be used for quick reductions from the command line. Use --help option to see all the available
options.
$ pyard --help
usage: pyard [-h] [-v] [-d DATA_DIR] [-i IPD_VERSION] [-g GL_STRING]
[-r {G,P,lg,lgx,W,exon,U2,S}] [--splits SPLITS] [--validate]
[--cwd CWD] [--expand-mac EXPAND_MAC] [--lookup-mac LOOKUP_MAC]
[--expand-xx EXPAND_XX] [--expand EXPAND]
[--similar SIMILAR_ALLELE] [--non-strict] [--verbose]
py-ard tool to redux GL String
options:
-h, --help show this help message and exit
-v, --version IPD-IMGT/HLA DB Version number
-d, --data-dir DATA_DIR
Data directory to store imported data
-i, --ipd-version IPD_VERSION
IPD-IMGT/HLA db to use for redux
-g, --gl GL_STRING GL String to reduce
-r, --redux-type {G,P,lg,lgx,W,exon,U2,S}
Reduction Method
--splits SPLITS Find Broad and Splits
--validate Validate the provided GL String
--cwd CWD Perform CWD redux
--expand-mac EXPAND_MAC
Expand MAC to Allele List
--lookup-mac LOOKUP_MAC
Lookup MAC for an Allele List
--expand-xx EXPAND_XX
Expand XX code to Allele List
--expand EXPAND Expand MAC or XX code to Allele List
--similar SIMILAR_ALLELE
Find Similar Alleles with given prefix
--non-strict Use non-strict mode
--verbose Use verbose mode
Reduce from command line by specifying any typing with -g or --gl option and the reduction method with -r
or --redux-type option.
$ pyard -g 'A*01:AB' -r lgx
A*01:01/A*01:02
$ pyard --gl 'DRB1*08:XX' -r G
DRB1*08:01:01G/DRB1*08:02:01G/DRB1*08:03:02G/DRB1*08:04:01G/DRB1*08:05/ ...
$ pyard -i 3290 --gl 'A1' -r lgx # For a particular version of DB
A*01:01/A*01:02/A*01:03/A*01:06/A*01:07/A*01:08/A*01:09/A*01:10/A*01:12/ ...
If the -r option is left out, pyard will print out the result of all reduction methods.
$ pyard -g 'A*01:01:01:01'
Reduction Method: G
-------------------
A*01:01:01G
Reduction Method: P
-------------------
A*01:01P
Reduction Method: lg
--------------------
A*01:01g
Reduction Method: lgx
---------------------
A*01:01
Reduction Method: W
-------------------
A*01:01:01:01
Reduction Method: exon
----------------------
A*01:01:01
Reduction Method: U2
--------------------
A*01:01
py-ard knows about the broad/splits of serology and DNA, you can find by using --splits option to pyard command.
$ pyard --splits "A*10"
A*10 = A*25/A*26/A*34/A*66
$ pyard --splits B14
B14 = B64/B65
Validate a GL String:
$ pyard -g 'A*01:01+A*02:01' --validate
Perform CWD reduction:
$ pyard --cwd 'B*15:01:01/B*15:01:03/B*15:04'
B*15:01
Expand MAC or XX codes:
$ pyard --expand-mac 'A*01:AB'
A*01:01/A*01:02
$ pyard --expand-xx 'A*01:XX'
A*01:01/A*01:02/A*01:03/...
Lookup MAC code:
$ pyard --lookup-mac 'A*01:01/A*01:02'
A*01:AB
Find similar alleles:
$ pyard --similar 'A*01:AB'
A*01:AB
A*01:AC
pyard-reduce-csv Batch Reduce a CSV file
pyard-reduce-csv can be used to batch process a CSV file with HLA typings. See documentation for
detailed information about all the options.
Generate sample configuration and CSV files:
$ pyard-reduce-csv --generate-sample
Created reduce_conf.json
Created sample.csv
Created reduce_conf_glstring.json
Created sample_glstring.csv
Reduce a CSV file using a configuration:
$ pyard-reduce-csv -c reduce_conf.json
py-ard REST Web Service
Run py-ard as a service so that it can be accessed as a REST service endpoint.
To start in debug mode, you can run the app.py script. The endpoint should then be available
at localhost:8080
$ python3 app.py
py-ard version: 2.0.0
IMGT version: 3631
`ConnexionMiddleware.run` is optimized for development. For production, run using a dedicated ASGI server.
INFO: Started server process [5344]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8080 (Press CTRL+C to quit)
Docker deployment of py-ard REST Web Service
For deploying to production, build a Docker image and use that image for deploying to a server.
Build the docker image:
make docker-build
builds a Docker image named nmdpbioinformatics/pyard-service:2.0.0.linux-amd64
Build the docker and run it with:
make docker
The endpoint should then be available at localhost:8080
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file py_ard-2.0.0rc0.tar.gz.
File metadata
- Download URL: py_ard-2.0.0rc0.tar.gz
- Upload date:
- Size: 83.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ebec1f4c9dc247628d8d3e78dc29e360063990aa664b8909cd7d76e450384e7
|
|
| MD5 |
1c1471db9e8b119e0d19593c23c005ca
|
|
| BLAKE2b-256 |
5eb79fd8ed763416f773beee7aa3c6805bee268e0f692b9eda432574ce59b1cb
|
File details
Details for the file py_ard-2.0.0rc0-py2.py3-none-any.whl.
File metadata
- Download URL: py_ard-2.0.0rc0-py2.py3-none-any.whl
- Upload date:
- Size: 105.9 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1993ae7a2eaac81d697dcae2263baa0a9097b77ea8d5fb1a7c616a1159fd9a71
|
|
| MD5 |
cee85df6a4996ce912553d3c084958f2
|
|
| BLAKE2b-256 |
77312d58b53438443bff888487c651a3ae2ef6511afb9760651b5619b4c24ec5
|