Skip to main content

Python library for parsing UniProt XML data

Project description

uniprotlib

Note: This library was vibe coded with Claude. It works, it's tested, but review accordingly.

Python library for parsing UniProt XML files. Handles both single-entry downloads and multi-GB gzip-compressed database dumps with bounded memory usage.

Installation

pip install uniprotlib

Or with uv:

uv add uniprotlib

Usage

from uniprotlib import parse_xml

# single file
for entry in parse_xml("Q9Y261.xml"):
    print(entry.primary_accession, entry.protein_name)

# gzipped bulk download
for entry in parse_xml("uniprot_sprot.xml.gz"):
    print(entry.gene.primary, entry.organism.scientific_name)

# multiple files
for entry in parse_xml("human.xml.gz", "mouse.xml.gz"):
    print(entry.primary_accession)

parse_xml() returns an iterator that yields UniProtEntry objects. Gzip detection is automatic based on the .gz extension. Memory stays bounded regardless of file size.

Parsed fields

Model Fields
UniProtEntry primary_accession, accessions, entry_name, dataset, protein_name, gene, organism, sequence, keywords, db_references
Gene primary, synonyms, ordered_locus_names, orf_names
Organism scientific_name, common_name, tax_id, lineage
Sequence value, length, mass, checksum
DbReference type, id, molecule, properties

All model classes are dataclasses with full type annotations and py.typed support.

Development

Requires Python >= 3.12 and uv.

uv sync
uv run pytest tests/ -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uniprotlib-0.2.0.tar.gz (93.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

uniprotlib-0.2.0-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file uniprotlib-0.2.0.tar.gz.

File metadata

  • Download URL: uniprotlib-0.2.0.tar.gz
  • Upload date:
  • Size: 93.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for uniprotlib-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b06c0a0fcb533c41f9f23d8bc774ab059185f151c14037a5aacc5fa49780072a
MD5 f5835078d7727163a8f134785815e1c7
BLAKE2b-256 9ad9cb395465c719163bee53aff75bb8f35d6628fa664502d3b4e56304033dd3

See more details on using hashes here.

Provenance

The following attestation bundles were made for uniprotlib-0.2.0.tar.gz:

Publisher: publish.yml on mpreusse/uniprotlib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file uniprotlib-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: uniprotlib-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for uniprotlib-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 030b7083232e9a0dc6a10a3b3bcc193dabbddde4f8b11d27f5d26b0aaffe77b6
MD5 fcb80e31d949543825d29e2a0cb383d5
BLAKE2b-256 ee28a16af47aa870800e8fa79f9fa93651defa0d5c733b60a1314a7efd7ace95

See more details on using hashes here.

Provenance

The following attestation bundles were made for uniprotlib-0.2.0-py3-none-any.whl:

Publisher: publish.yml on mpreusse/uniprotlib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page