Skip to main content

Sequence-based programmatic access to the AlphaFold Protein Structure Database

Project description

afdb-query

Sequence-based programmatic access to the AlphaFold Protein Structure Database (AFDB). Query a protein by its amino-acid sequence, then pull per-residue pLDDT — including "the first n values" — without hand-rolling URL derivation and JSON fetching.

Install

pip install afdb-query

Quickstart

from afdb_query import AlphaFold

with AlphaFold() as af:
    hits = af.search(sequence)        # Tier 1: list[Structure], in AFDB's returned order
    s = hits[0]

    s.global_plddt        # mean pLDDT for the model (cheap, from the summary)
    s.sequence_identity   # 1.0 == exact match, < 1.0 == near hit
    s.uniprot_accession   # e.g. "P12345", or None

    p = s.plddt()         # Tier 2: per-residue pLDDT (fetched once, then cached)
    p.scores              # full per-residue list[float]
    p.first(50)           # first 50 values — or all of them if the model is shorter

search raises InvalidSequenceError for sequences that cannot be queried (internal stop *, shorter than 20 residues, or non-standard amino acids), and returns [] when AFDB has no entry for a valid sequence.

Results come back in AFDB's returned order (ranked by sequence identity). Note that hits[0] is not guaranteed to be the canonical AF-<accession>-F1 model — for some sequences a multi-chain or AB-INITIO model ranks first — so pick the hit whose model_identifier you want if you need a specific entry.

Batch lookups

search_many runs many sequences concurrently with resumable on-disk caching:

report = af.search_many(
    [{"id": "rec1", "sequence": seq1}, {"id": "rec2", "sequence": seq2}],
    out_dir="afdb_cache",
    concurrency=6,
    plddt_first_n=50,   # optional: also save the first 50 per-residue pLDDT per hit
)
# report -> {"total":..., "hits":..., "misses":..., "errors":..., "skipped":..., ...}
  • You supply a generic id per sequence; it keys the cache file and maps back to your own records.

  • out_dir/summaries/{id}.json stores each hit (a 404 miss stores {"structures": []}); existing files are left untouched, so re-runs resume.

  • With plddt_first_n set, out_dir/plddt/{id}.json stores the raw first-n per-residue pLDDT array for the best structure.

  • Real HTTP errors are counted but not saved, so they retry on the next run.

    Note: resumability keys on the summary file. If you run once without plddt_first_n and again with it, already-cached records are skipped and their pLDDT is not back-filled.

Not (yet) supported

  • UniProt-accession lookup (sequence-only for now)
  • PAE (Predicted Aligned Error)
  • No statistics helpers — the package returns raw values; downstream math is yours.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

afdb_query-0.1.0.tar.gz (22.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

afdb_query-0.1.0-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file afdb_query-0.1.0.tar.gz.

File metadata

  • Download URL: afdb_query-0.1.0.tar.gz
  • Upload date:
  • Size: 22.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for afdb_query-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9a9af5d16bbd56bd7622cd43766664fa538a25e9493eaa4cc35fd1d46edef558
MD5 a1e054533837d4725fbf39d55955ce11
BLAKE2b-256 dd26ee63f70c98ca18397f31948645735c9a1063acd2fd18c763b74a6fb56f65

See more details on using hashes here.

File details

Details for the file afdb_query-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: afdb_query-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for afdb_query-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 40e566e06d92d96913344b4bd456b2f198d428e9f4b6b78ba62457660f301b33
MD5 d9a051021ebb3e3fd3a81e5627025e23
BLAKE2b-256 4a0f7d7116d5a3ded058bdb33f3cdffe5119d7d365c07f447fe34834312cd8d1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page