Sequence-based programmatic access to the AlphaFold Protein Structure Database
Project description
afdb-query
Sequence-based programmatic access to the AlphaFold Protein Structure Database (AFDB). Query a protein by its amino-acid sequence, then pull per-residue pLDDT — including "the first n values" — without hand-rolling URL derivation and JSON fetching.
Install
pip install afdb-query
Quickstart
from afdb_query import AlphaFold
with AlphaFold() as af:
hits = af.search(sequence) # Tier 1: list[Structure], in AFDB's returned order
s = hits[0]
s.global_plddt # mean pLDDT for the model (cheap, from the summary)
s.sequence_identity # 1.0 == exact match, < 1.0 == near hit
s.uniprot_accession # e.g. "P12345", or None
p = s.plddt() # Tier 2: per-residue pLDDT (fetched once, then cached)
p.scores # full per-residue list[float]
p.first(50) # first 50 values — or all of them if the model is shorter
search raises InvalidSequenceError for sequences that cannot be queried
(internal stop *, shorter than 20 residues, or non-standard amino acids), and
returns [] when AFDB has no entry for a valid sequence.
Results come back in AFDB's returned order (ranked by sequence identity). Note that
hits[0] is not guaranteed to be the canonical AF-<accession>-F1 model — for
some sequences a multi-chain or AB-INITIO model ranks first — so pick the hit whose
model_identifier you want if you need a specific entry.
Batch lookups
search_many runs many sequences concurrently with resumable on-disk caching:
report = af.search_many(
[{"id": "rec1", "sequence": seq1}, {"id": "rec2", "sequence": seq2}],
out_dir="afdb_cache",
concurrency=6,
plddt_first_n=50, # optional: also save the first 50 per-residue pLDDT per hit
)
# report -> {"total":..., "hits":..., "misses":..., "errors":..., "skipped":..., ...}
-
You supply a generic
idper sequence; it keys the cache file and maps back to your own records. -
out_dir/summaries/{id}.jsonstores each hit (a 404 miss stores{"structures": []}); existing files are left untouched, so re-runs resume. -
With
plddt_first_nset,out_dir/plddt/{id}.jsonstores the raw first-n per-residue pLDDT array for the best structure. -
Real HTTP errors are counted but not saved, so they retry on the next run.
Note: resumability keys on the summary file. If you run once without
plddt_first_nand again with it, already-cached records are skipped and their pLDDT is not back-filled.
Not (yet) supported
- UniProt-accession lookup (sequence-only for now)
- PAE (Predicted Aligned Error)
- No statistics helpers — the package returns raw values; downstream math is yours.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file afdb_query-0.1.0.tar.gz.
File metadata
- Download URL: afdb_query-0.1.0.tar.gz
- Upload date:
- Size: 22.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a9af5d16bbd56bd7622cd43766664fa538a25e9493eaa4cc35fd1d46edef558
|
|
| MD5 |
a1e054533837d4725fbf39d55955ce11
|
|
| BLAKE2b-256 |
dd26ee63f70c98ca18397f31948645735c9a1063acd2fd18c763b74a6fb56f65
|
File details
Details for the file afdb_query-0.1.0-py3-none-any.whl.
File metadata
- Download URL: afdb_query-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
40e566e06d92d96913344b4bd456b2f198d428e9f4b6b78ba62457660f301b33
|
|
| MD5 |
d9a051021ebb3e3fd3a81e5627025e23
|
|
| BLAKE2b-256 |
4a0f7d7116d5a3ded058bdb33f3cdffe5119d7d365c07f447fe34834312cd8d1
|