Skip to main content

Download and output sequences from IMGT/HLA

Project description

hladl

(HLA downloader)

JH @ MGH, 2025

This is a simple CLI to make grabbing specific HLA allele sequences easier. It aims to be similar to hladownload but without the more advanced features that offers (although that script appears to be out of action due to Biopython version changes since its last update).

Effectively, this script will spit out a cDNA nucleotide or protein amino acid sequence, given an allele identifier and a number of digits resolution. Sequences are grabbed from the IMGTHLA Github repo and stored locally in a gzippd json, allowing them to be output without a need for later internet connectivity.

Installation

hladl was made with poetry and typer. It can be installed from PyPI:

pip install hladl

Usage

Sequences can be downloaded to the installed data directory using hladl init. Users specify the sequence type (nucleotide, protein, or both) with the -s flag, and the HLA allele digit resolution (i.e. 2, 4, 6, or 8 digit, being HLA-X*22:44:66:88) wit the -d flag like so:

# Download nucleotide (cDNA) sequences for 4 digit alleles
hladl init -s nuc -d 4
 
# Download protein (AA) sequences for 2 digit alleles
hladl init -s prot -d 2

Sequences can then be output to stdout using the seq command:

hladl seq -a DRA*01:01
hladl seq -a A*02 -s prot -d 2

Class I MHC protein sequences can also be automatically trimmed to remove leader and transmembrane/intracellular domains, yielding the extracellular domain, by specifying this in the mode option:

hladl seq -a A*02:01 -m ecd -s prot

Users can also instead choose to produce a FASTA file of the designated allele using the -om / --output_mode flag, which saves to the current directory:

hladl seq -a B*07:02 -om fasta

The location of the data directory can be determined using the dd command:

hladl dd

# Will produce something like
/path/to/where/its/saving/stuff

Notes

  • If you run the hladl seq script without running the appropriate hladl init, it will try to download the appropriate sequences on the fly.

  • While the IMGTHLA repo does also store unspliced genomic DNA files, these are handled slightly different, are much larger files, and frankly I don't need them in my pipelines right now, so they're not yet catered to.

  • Pseudogenes and other aberrent length entries in the dataset cannot be used for ecd mode.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hladl-0.1.1.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hladl-0.1.1-py3-none-any.whl (8.8 kB view details)

Uploaded Python 3

File details

Details for the file hladl-0.1.1.tar.gz.

File metadata

  • Download URL: hladl-0.1.1.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.0 Darwin/24.3.0

File hashes

Hashes for hladl-0.1.1.tar.gz
Algorithm Hash digest
SHA256 bb60d22960cc0fdd9c18415cdaf7860ce98b868577d3670ae362dfc376d72ab6
MD5 b591df56374809f82a67f3566db967e3
BLAKE2b-256 ba4b0de5f609f09ee0b14198f2b4d9a539187af625bbcb759c71f455c3cbb556

See more details on using hashes here.

File details

Details for the file hladl-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: hladl-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 8.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.0 Darwin/24.3.0

File hashes

Hashes for hladl-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2db6764b4ea96c54f7ee24a2b246ce86ed87276a854f83ae10bea40bf0130783
MD5 e17c7e3fd933afd5d1045020a6e21f0c
BLAKE2b-256 fd079fdbc868e87bda20d852412f21850e31ad84a6cfbe27c8d3cfc2f983ebcf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page