Unified data processing for AlphaFold3-like models
Project description
UniAF3
Prepare inputs and process outputs for AlphaFold3-like models, including AlphaFold3, Boltz, Chai-1, and Protenix-v1.
UniAF3 provides a unified YAML-based input format that serves as a common intermediate representation for converting between different AlphaFold3-family structure prediction models. The format supports specifying molecular sequences, restraints, and inference parameters in a single configuration file.
Feature Support
The following table summarizes feature support across all models:
| Feature | UniAF3 | AlphaFold3 | AF3 Server | Boltz | Chai-1 | Protenix |
|---|---|---|---|---|---|---|
| Sequences | ||||||
| Protein chains | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| DNA chains | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| RNA chains | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Ligands (CCD) | ✅ | ✅ | ✅ (limited set) | ✅ (single CCD only) | ⚠️ (converted to SMILES) | ✅ (multi-CCD supported) |
| Ligands (SMILES) | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ |
| Ligands (file path) | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
| Ligands (user CCD) | ❌ | ✅ (user-provided CCD) | ❌ | ❌ | ❌ | ❌ |
| Multi-CCD ligands | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ |
| Glycans | ✅ (Chai notation) | ⚠️ (as multi-CCD ligands with bonds) | ❌ | ⚠️ (single sugar only) | ✅ | ⚠️ (as multi-CCD ligand) |
| Ions | ✅ (as CCD ligand) | ✅ (as CCD ligand) | ✅ (dedicated type) | ✅ (as CCD ligand) | ❌ | ✅ (dedicated type) |
| Homomeric copies | ✅ (via id list) | ✅ (via id list) | ✅ (via count) | ✅ (via id list) | ❌ (separate entities) | ✅ (via count) |
| Modifications | ||||||
| Protein PTMs | ✅ | ✅ | ✅ (limited CCD set) | ✅ | ✅ (inline CCD) | ✅ |
| DNA modifications | ✅ | ✅ | ✅ (limited CCD set) | ✅ | ✅ (inline CCD) | ✅ |
| RNA modifications | ✅ | ✅ | ✅ (limited CCD set) | ✅ | ✅ (inline CCD) | ✅ |
| Cyclic polymers | ✅ (Boltz-specific) | ❌ | ❌ | ✅ | ❌ | ❌ |
| MSA & Templates | ||||||
| Custom MSA | ✅ (via msa_dir) | ✅ (inline or path) | ❌ | ✅ (CSV or A3M) | ✅ (via msa_directory) | ✅ (path) |
| Paired MSA | ✅ | ✅ | ❌ | ✅ (CSV key column) | ✅ | ✅ |
| Structural templates | ✅ | ✅ (mmCIF) | ❌ | ✅ (CIF/PDB) | ✅ (via server) | ✅ (A3M/HHR) |
| Restraints | ||||||
| Covalent bonds | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ |
| Contact restraints | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ |
| Pocket restraints | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ |
| Inference Parameters | ||||||
| Random seeds | ✅ | ✅ | ✅ (can be empty) | ❌ (CLI arg) | ✅ (single seed) | ❌ (CLI arg) |
| Recycling steps | ✅ | ❌ (CLI arg) | ❌ | ❌ (CLI arg) | ✅ | ❌ (CLI arg) |
| Diffusion steps | ✅ | ❌ (CLI arg) | ❌ | ❌ (CLI arg) | ✅ | ❌ (CLI arg) |
| Diffusion samples | ✅ | ❌ (CLI arg) | ❌ | ❌ (CLI arg) | ✅ | ❌ (CLI arg) |
| Affinity prediction | ✅ (Boltz-specific) | ❌ | ❌ | ✅ | ❌ | ❌ |
Legend: ✅ = fully supported, ⚠️ = partially supported / lossy conversion, ❌ = not supported
CLI Usage
Validate a config
Validate an input config file and print its contents:
uniaf3 validate INPUT_CONFIG_FILE [--format FORMAT]
Arguments:
INPUT_CONFIG_FILE— Path to the config file to validate (required).
Options:
--format,-f— Format of the input config file (default:uniaf3). Supported values:uniaf3,alphafold3,alphafold3server,boltz,chai,protenix.
Examples:
# Validate a UniAF3 config
uniaf3 validate input.yaml
# Validate a Boltz config
uniaf3 validate boltz_input.yaml --format boltz
# Validate an AlphaFold3 JSON
uniaf3 validate af3_input.json -f alphafold3
For Chai-1 configs, if a .restraints or .csv file with the same stem exists alongside the FASTA file, it will be loaded automatically.
Convert between formats
Convert an input config file from one format to another:
uniaf3 convert INPUT_CONFIG_FILE OUTPUT_DIR [PREFIX] [--from-format FORMAT] [--to-format FORMAT]
Arguments:
INPUT_CONFIG_FILE— Path to the input config file (required).OUTPUT_DIR— Directory for the output config file(s) (required).PREFIX— Prefix for output file name(s). Defaults to the input file name without extension.
Options:
--from-format,-f— Source format (default:uniaf3).--to-format,-t— Target format (default:alphafold3).
Examples:
# UniAF3 → AlphaFold3
uniaf3 convert input.yaml output_dir/ --from-format uniaf3 --to-format alphafold3
# Boltz → Chai-1
uniaf3 convert boltz_input.yaml output_dir/ --from-format boltz --to-format chai
# AF3 → Protenix
uniaf3 convert af3_input.json output_dir/ --from-format alphafold3 --to-format protenix
Input Format
UniAF3 configs are written in YAML. The top-level structure is:
sequences:
- # Polymer, Ligand, or Glycan entries
covalent_bonds: # Optional
- # CovalentBond entries
contact_restraints: # Optional
- # ContactRestraint entries
pocket_restraints: # Optional
- # PocketRestraint entries
aux: # Optional, inference parameters
seeds:
- 42
num_trunk_recycles: 3
num_diffn_timesteps: 200
num_diffn_samples: 5
num_trunk_samples: 1
Sequences
Each entry in the sequences list must be one of four types:
Protein
Proteins use the ProteinSeq schema (which extends Polymer) and support MSA directories and structural templates.
- polymer_type: protein
id: A # or [A, B] for homomeric copies
sequence: MVLSPADKTNVK # Standard 1-letter amino acid codes
description: "My protein" # Optional description
modifications: # Optional PTMs
- ccd: HY3 # CCD code of modification
position: 1 # 1-based residue index
msa_dir: path/to/msa/ # Optional, directory containing MSA files
templates: # Optional structural templates
- path: template.cif # Path to mmCIF or PDB file
query_idx: [0, 1, 2] # 0-based query residue indices
template_idx: [0, 1, 2] # 0-based template residue indices
query_chains: [A] # Optional, chain IDs in query
template_chains: [A] # Optional, chain IDs in template
boltz_enable_force: false # Boltz-specific: enforce template
boltz_template_threshold: null # Boltz-specific: deviation threshold (Å)
boltz_cyclic: false # Boltz-specific: cyclic polymer flag
MSA Directory Structure:
The msa_dir field points to a directory with the following expected structure:
msa_dir/
a3ms/
{seq_hash}.single.a3m # Unpaired MSA
{seq_hash}.pair.a3m # Paired MSA (optional)
Where {seq_hash} is the SHA-256 hex digest of the protein sequence. This follows the Chai-1 MSA search output convention.
DNA
- polymer_type: dna
id: C
sequence: GATTACA # Only A, T, G, C allowed
modifications: # Optional
- ccd: 6OG
position: 1
RNA
- polymer_type: rna
id: D
sequence: AGCU # Only A, U, G, C allowed
modifications: # Optional
- ccd: 2MG
position: 1
Ligand
Ligands must specify exactly one of ccd (a list of CCD codes) or smiles:
# CCD ligand (single or multi-CCD)
- id: E
ccd:
- ATP
# Multi-CCD ligand (e.g., glycan as ligand)
- id: F
ccd:
- NAG
- BMA
# SMILES ligand
- id: G
smiles: "CC(=O)OC1C[NH+]2CCC1CC2"
Glycan
Glycans use Chai-1's glycan notation (modified CCD codes with bond information):
- id: H
chai_str: "NAG(4-1 NAG(4-1 BMA(3-1 MAN)(6-1 MAN)))"
description: "Branched glycan"
For single sugars without bonds: chai_str: NAG
Chain IDs
Chain IDs (id field) serve as unique identifiers for each entity. They can be:
- A single string:
id: A - A list of strings for homomeric copies:
id: [A, B, C]
Chain IDs are used to reference entities in restraints. When converting to models that use count-based copies (AF3 Server, Protenix), the number of IDs in the list determines the copy count.
The chain ID naming convention follows standard spreadsheet-style ordering:
A, B, ..., Z, AA, AB, AC, ..., AZ, BA, BB, ...
This is generated by the int_to_letters() function (1-indexed): int_to_letters(1) → A, int_to_letters(27) → AA, int_to_letters(28) → AB.
Note: The open-source AlphaFold3 documentation uses a "reverse spreadsheet style" ordering (
AA, BA, CA, ...). UniAF3 standardizes on the conventional spreadsheet ordering for internal consistency across all adapters.
Restraints
Covalent Bonds
Specify covalent bonds between atoms from different entities:
covalent_bonds:
- atom1:
chain_id: A # Entity ID
residue_idx: 5 # 1-based residue index (0 for ligands)
atom_name: CG # Atom name (e.g., CA, N, SG)
residue_name: P # Optional, for validation
atom2:
chain_id: E # Entity ID
residue_idx: 1 # 1-based position within ligand
atom_name: C04 # Atom name in the ligand
residue_name: null # Not required for ligands
description: "Optional description"
Notes:
atom_nameis required for both atoms.residue_nameis used by Chai-1 for validation and restraint formatting.- For ligands,
residue_idxis typically 1 for single-CCD or SMILES ligands. - Ligand atom names follow RDKit naming conventions.
Contact Restraints
Distance restraints between two atoms/residues:
contact_restraints:
- token1:
chain_id: A
residue_idx: 10 # 1-based, or 0 if atom_name is used for ligands
atom_name: null # Optional for polymers, required for ligands
residue_name: K # Optional, for validation
token2:
chain_id: C
residue_idx: 5
atom_name: null
residue_name: null
max_distance: 8.0 # Maximum distance in Å (must be 4-20 Å)
min_distance: 0.0 # Minimum distance in Å (Protenix only)
boltz_enable_force: true # Boltz-specific: enforce with potential
Notes:
max_distancemust be between 4.0 and 20.0 Å (Boltz requirement, applied universally).min_distanceis only used by Protenix.- AF3 and AF3 Server do not support contact restraints.
Pocket Restraints
Specify a binding pocket where a binder chain interacts with specific contact residues:
pocket_restraints:
- binder_chain: E # ID of the chain binding to the pocket
contact_tokens: # List of residues forming the pocket
- chain_id: A
residue_idx: 10
atom_name: null # For polymers; use atom_name for ligands
residue_name: K
- chain_id: A
residue_idx: 15
atom_name: null
residue_name: G
max_distance: 6.0 # Maximum distance in Å (4-20 Å)
min_distance: 0.0 # Protenix only
boltz_enable_force: false # Boltz-specific: enforce with potential
Notes:
- Contact tokens must NOT be on the same chain as
binder_chain. - Protenix supports only a single pocket constraint per job.
- AF3 and AF3 Server do not support pocket restraints.
Inference Parameters
The aux field contains optional inference parameters:
aux:
num_trunk_recycles: 3 # Default: 3
num_diffn_timesteps: 200 # Default: 200
num_diffn_samples: 5 # Default: 5
num_trunk_samples: 1 # Default: 1
name: "job_name" # Optional, used in AF3 Server
boltz_affinity_binder_chain: D # Boltz-specific: affinity binder chain ID
Seeds
Seeds are stored in aux.seeds as a list of integer random seeds:
aux:
seeds:
- 42
- 123
- AF3 uses all seeds directly.
- Chai-1 uses only the first seed; additional seeds are applied via
num_trunk_samples. - Boltz and Protenix do not store seeds in their config format; default
[42]is used on import.
Validation Rules
The UniAF3 schema enforces these validation rules:
- At least one sequence must be provided.
- Modification positions must be within the sequence length.
- Ligands must specify exactly one of
ccdorsmiles. - Covalent bond atoms must have non-null
atom_name. - Contact restraints require
max_distancebetween 4.0 and 20.0 Å, andmax_distance > min_distance. - Pocket restraint contact tokens must not be on the same chain as
binder_chain. - Restraint atoms must reference valid chain IDs, and residue indices must be within the sequence length.
- Residue names in restraints (when provided) are validated against the sequence.
Complete Example
sequences:
- polymer_type: protein
id: [A, B]
sequence: MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLS
msa_dir: dummy_msa/
modifications:
- ccd: HY3
position: 1
description: Hemoglobin subunit
- polymer_type: dna
id: C
sequence: GATTACA
- id: D
ccd:
- ATP
- id: E
smiles: "CC(=O)OC1C[NH+]2CCC1CC2"
- id: F
chai_str: NAG
description: Example glycan
covalent_bonds:
- atom1:
chain_id: B
residue_idx: 2
atom_name: CA
residue_name: V
atom2:
chain_id: D
residue_idx: 1
atom_name: C04
residue_name: null
contact_restraints:
- token1:
chain_id: A
residue_idx: 5
atom_name: CG
residue_name: P
token2:
chain_id: B
residue_idx: 5
atom_name: null
residue_name: P
max_distance: 8.0
boltz_enable_force: true
pocket_restraints:
- binder_chain: D
max_distance: 6.0
contact_tokens:
- chain_id: A
residue_idx: 10
atom_name: null
residue_name: N
- chain_id: B
residue_idx: 3
atom_name: null
residue_name: L
aux:
seeds:
- 42
- 123
num_trunk_recycles: 3
num_diffn_timesteps: 200
num_diffn_samples: 5
num_trunk_samples: 1
boltz_affinity_binder_chain: D
Model-specific Documentation
For detailed documentation on each model's native input format, see:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file uniaf3-0.2.0.tar.gz.
File metadata
- Download URL: uniaf3-0.2.0.tar.gz
- Upload date:
- Size: 1.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4a5fb1fb4e289b2476d173c879d3f8ada820378809eeb6e57584503227fd727
|
|
| MD5 |
5540d25d8d22511105807935a12dd04c
|
|
| BLAKE2b-256 |
0f5a40e18b188a73e01787aaf8665e673c825f5404dfdad824e1ece11d46e9fa
|
File details
Details for the file uniaf3-0.2.0-py3-none-any.whl.
File metadata
- Download URL: uniaf3-0.2.0-py3-none-any.whl
- Upload date:
- Size: 2.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6af42cc524b540165577d106a0ca41eb1074abb5d4c8c9151f3ba0f08e47c0c0
|
|
| MD5 |
386b92c19d1768627900f9547fa2e22f
|
|
| BLAKE2b-256 |
054f2e0e52ff93aa3ecd4e4cf9095b6b289049cd9423d8af9e9d5cb62d2b535e
|