Toolkit for working with AlphaFold3 results and confidence metadata
Project description
FoldKit
A Python toolkit for working with and efficiently storing AlphaFold3 results. Contains both a CLI for importing/exporting results to foldkit format, and a python API for accessing metrics of folded structures.
Installation
pip install foldkit
Use Cases
There are two main use cases for this package:
(1) Convenient python interface for accessing AlphaFold3 prediction confidence metrics. This is particularly useful for pulling out inter-chain metrics from predicted protein complexes, as they have been shown to be predictive of binding and are useful criteria for protein design and specificity predictions.
(2) Efficient Storage of AlphaFold3 results. The default JSON formats for AF3 confidence results are large, and can take up a lot of unnecessary space. Foldkit has a CLI for exporting the AF3 confidence JSONs to space-efficient .npz files, removing other unnecessary files, and copying over the rest. These .npz files can also be used as alternative inputs to the python interface in (1) for easy use.
How much more efficient is "space-efficient"
Early benchmarking shows that a single AF3 output directory for a four-chain protein is around 7.8M, while the foldkit exported version is 1.9M. This may seem like a small difference, but can scale over a large protein design of co-folding campaign over a large dataset. For example, a parent directory of ~1000 AF3 folded complexes, each with 4 seeds and 5 samples, the total space to store the results goes from 157G --> 38G.
Python Interface Tutorial
Let's say you have a directory that contains the results of an AlphaFold3 prediction for a protein complex. This protein complex is actually a TCR with the following chains: ["A", "B", "M", "P"] (which is the TCRa, TCRb, MHCa, peptide). These results are stored in a directory:
"structures/tcr_pmhc_1/".`
We can can load the results:
import foldkit
result_obj = foldkit.AF3Result.load_result("structures/tcr_pmhc_1/")
This object has access to all of the confidence metadata, as well as the ability to compute specific statistics on the metadata.
>>> result_obj.chains
[np.str_('A'), np.str_('B'), np.str_('M'), np.str_('P')]
For example, the structure wide PTM:
>>> result_obj.get_ptm()
0.81
Or, just the average PTM for the TCRa chain:
>>> result_obj.get_ptm("A")
0.82
Here is the average interaction_pae (ipae) between the TCRb chain and the peptide:
>>> result_obj.get_ipae(chain1="B", chain2="P")
np.float64(6.253699186991869)
By default, these methods compute the average. But maybe you want a different aggregation function? You can pass in a custom agg:
>>> result_obj.get_ipae(chain1="B", chain2="P", agg=np.min)
np.float64(1.3)
Loading from .npz format
From the CLI, let's say you had previously exported the result of a AF3 run, so that the result from before
at "structures/tcr_pmhc_1/" is now at "structures_compressed/tcr_pmhc_1/". This second directory will have a .npz file in it instead of JSON files. You can load it in a very similar way by adding the from_npz=True flag
result_obj = foldkit.AF3Result.load_result("structures/tcr_pmhc_1/", from_npz=True)
folkdkit - CLI Tutorial
usage: foldkit [-h] [--verbose] {export-single-result,export-multi-result,batch-export-multi-result} ...
Export AlphaFold3 result directories into compressed format.Converts confidences into npz format and copies over the rest of the data as is (except the _input_data.json which is not kept since it is redundant).
positional arguments:
{export-single-result,export-multi-result,batch-export-multi-result}
export-single-result
Export a single AlphaFold3 result directory to compressed format
export-multi-result
Export multiseed/multisample AlphaFold3 results to compressed format.
batch-export-multi-result
Export multiple AlphaFold3 results to compressed format.
options:
-h, --help show this help message and exit
--verbose, -v Print detailed output.
There are 3 main entry points, depending on the data you are exporting:
- A single prediction directory (i.e. one prediction corresponding to a single seed and sample)
- A prediction directory (i.e. N*K predictions corresponding to the same input with N seeds and K samples)
- A directory of prediction directories (i.e. a directory containing many "prediction directories" like in (2).
1- Export a single result (i.e. one single structure from a single seed and sample)
foldkit export-result /path/to/specific_structure_directory /path/to/outdir
2- Export a single result with multiple seeds and/or samples
foldkit export-multi-result /path/to/specific_structure_parent_directory /data1/greenbab/users/levinej4/af3/foldkit/tests/test_data/test-m1
3- Batch export many results
foldkit -v batch-export-multi-result /path/to/directory_of_subdirectories/ /path/to/outdir
Contributing
Run pytests from top level:
PYTHONPATH=src pytest tests/ -vvv
Build:
python -m build
Deploy:
twine upload dist/* -u __token__ -p <API TOKEN>
Sphinx Documentation
1️⃣ Make sure you're on gh-pages branch
git checkout gh-pages
2️⃣ Build the HTML
cd docs make html cd ..
3️⃣ Copy the built HTML to the root (overwrite existing)
rsync -av --delete docs/build/html/ .
4️⃣ Add & commit
git add . git commit -m "Update docs"
5️⃣ Push to GitHub
git push origin gh-pages
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file foldkit-0.1.2.tar.gz.
File metadata
- Download URL: foldkit-0.1.2.tar.gz
- Upload date:
- Size: 12.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a79a33a2c365062af434836957be8da27817e70bda877e8caeaeafc42a5eb96
|
|
| MD5 |
1464b22c9c3776dfbdbb1045df530249
|
|
| BLAKE2b-256 |
d3972f5c6a98f096d6b0566599caec6eadfde7c7263539535ba0bdb620246ebd
|
File details
Details for the file foldkit-0.1.2-py3-none-any.whl.
File metadata
- Download URL: foldkit-0.1.2-py3-none-any.whl
- Upload date:
- Size: 10.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5b9646b25a52c044a7247b5c888a49fbe991eb2c1dc7fd8cc993efec2df490a2
|
|
| MD5 |
f37564e403ed3c5ce9775326431c0efd
|
|
| BLAKE2b-256 |
724034d697d0805d181cff3d8494f2389efccf5720319e9eed325ace7c1c443d
|