A command-line interface and Python library for generating AlphaFold3 input files.
Project description
af3cli
A command-line interface and Python library for generating AlphaFold3 input files.
Installation
We recommend using uv to manage your installation.
uv sync --locked
This automatically creates a virtual environment .venv in the project folder and installs all dependencies. If you do not need the optional dependencies for reading SDF (RDKit) or FASTA files (Biopython), the installation can be prevented with --no-group features.
Basic Usage
The generation of AlphaFold3 input files can be done either with the standalone CLI tool or for more advanced tasks by using the library in Python scripts. Python Fire is used to implement the CLI application.
For a detailed overview of all available JSON fields check the AlphaFold3 input documentation.
[!WARNING] In most cases, checks are only carried out to ensure a correct structure of the input file, but not whether the inputs themselves are valid.
You can display the help and overview of the available CLI commands with the following statement.
af3cli -- --help
All commands and sub-commands are separated with - to enable chaining, allowing e.g. several sequences, ligands, bonds etc. to be added.
af3cli toplevel sub [...] - sub [...] \
- toplevel sub [...]
You can use the debug command to display the final file without writing it.
af3cli debug --show - [...]
Config Parameters
The config command is used to manage basic settings, such as the file name of the JSON file to be written, the name of the job or the respective version.
af3cli config -f "filename.json" -j "jobname" -v 2
The library provides an InputBuilder that allows new jobs to be created very comfortably step by step.
from af3cli import InputBuilder
builder = InputBuilder()
builder.set_name("jobname")
builder.set_version(2)
# builder.set_dialect("alphafold3") # default
input_file = builder.build()
input_file.write("filename.json")
You can also initialize the InputBuilder with an existing InputFile object in order to add further sequences or ligands or to change settings.
Random Seeds
It is required that at least one random seed is specified. The default value is therefore 1. Otherwise, you can either specify a number of values to generate a list of random seeds or pass a list of integers yourself.
af3cli seeds -n 10 - ...
# generates 10 random numbers
af3cli seeds -v "1,2,3"
# "(1,2,3,...)" or "[1,2,3,...]" are also valid
Python:
builder.set_seeds([1, 2, 3])
Sequences
Adding sequences works basically the same for all three available types, but not all JSON fields are supported for each type. The corresponding subcommands therefore differ in some cases.
af3cli [...] \
- protein add "MVKLAGST" \ # positional argument
- protein add --sequence "AAQAA" \
- dna add --sequence "AATTTTCC" \
- rna add --sequence "UUUGGCCGG"
A check is performed to ensure that the sequence characters match the respective type in the CLI application or in the Python library itself, when the Sequence object is converted into a dictionary. When using the sequence base class, the corresponding SequenceType must be specified. We therefore provide derived classes for the respective sequence types to simplify use.
from af3cli import ProteinSequence # DNASequence, RNASequence
# DNASequence / RNASequence
protein_seq = ProteinSequence(
"MVKLAGST...",
#[...]
)
builder.add_sequence(protein_seq)
For DNA sequences, it is possible to generate the reverse complementary strand. The associated data, such as manually specified IDs or modifications, are not included. The CLI tool will generate appropriate warnings.
af3cli [...] \
- dna add --sequence "AATTTTCC" --complement
Python:
from af3cli import DNASequence
dna_seq = DNASequence(
"AATTTTCC...",
#[...]
)
rc_dna_seq = dna_seq.reverse_complement()
builder.add_sequence(dna_seq)
builder.add_sequence(rc_dna_seq)
If modifications or manually defined IDs are required, the complementary sequence must be created separately.
FASTA Files
As it is often not very practical to add many or particularly long sequences via the CLI, it is possible to read the respective sequence from a FASTA file. To use this feature, Biopython must be installed as an optional dependency.
af3cli protein add --sequence <filename> --fasta
Each sequence command expects exactly one single sequence. Otherwise it is not possible to add additional fields, such as modifications or templates. However, it is still possible to read several sequences from a FASTA file if the additional features are not required. For even more advanced tasks, the Python API must otherwise be used.
af3cli [...] - fasta [--filename] <filename>
The respective sequence type is automatically detected, which is not possible in rare cases. If this is the case, the sequence is ignored and a warning is issued. It is, therefore, advisable to add all sequences whose type cannot be clearly identified separately via the sequence commands.
There are also two ways of doing this when using Python. The fasta2seq function can be used to obtain a generator that automatically creates Sequence objects and the read_fasta function is used to create a generator that returns the plain FASTA IDs and sequences from the FASTA file as a string.
from af3cli.sequence import fasta2seq, read_fasta
for seq in fasta2seq(filename):
...
# do something with the Sequence object
for fasta_id, seq_str in read_fasta(filename):
...
# create your own Sequence objects
Modifications
By applying the modification subcommand, any number of modifications can be added to the sequences with the respective CCD identifier and position. The different fields in the JSON file are automatically inserted correctly based on the sequence type.
# as positional arguments
af3cli [...] protein [...] - modification "SEP" 5
# or with explicit argument names
af3cli [...] dna [...] - modification --mod "6OG" --pos 1
When using the Python API, you have to explicitly define what kind of modifications you would like to add, since the resulting JSON fields are different for protein (ResidueModification) or nucleotide sequences (NucleotideModification). Please note that checks are performed to verify the modification types when the Sequence object is converted to a dictionary.
from af3cli import ProteinSequence, ResidueModification
# NucleotideModification
rmod = ResidueModification("SEP", 5)
protein_seq = ProteinSequence(
"<SEQUENCE>",
# [...]
modifications=[rmod]
)
# it is possible to add more modifications later
protein_seq.modifications.append(rmod)
Structural Templates
For protein sequences, it is possible to specify multiple structural templates in mmCIF format as a string or path. Since it is completely impractical to use strings via the CLI tool, the file must be submitted as plain text and is then read in its entirety as a string.
# read the file as string with the '--read' flag
af3cli [...] protein [...] - template [--mmcif] <filename> --read
# keep relative/absolute path
af3cli [...] protein [...] - template [--mmcif] <filename>
# specify query and template indices as list of integers
# "1,2,3,..." | "(1,2,3,...)" | "[1,2,3,...]" are valid
af3cli [...] protein [...] \
- template [--mmcif] <filename> -q "..." -t "..."
As it makes no difference to Python whether the string contains a path to a file or the file content, all you need to do is specify the template type. The file must then be read manually beforehand if a string is desired in the JSON file.
from af3cli import Template, TemplateType, ProteinSequence
# TemplateType.FILE for relative/absolute path
t = Template(
TemplateType.STRING,
"mmCIF content",
qidx=[], tidx=[]
)
protein_seq = ProteinSequence(
"<SEQUENCE>",
# [...]
templates=[t]
)
# it is possible to add more templates later
protein_seq.templates.append(t)
Multiple Sequence Alignment
Please refer to the AlphaFold3 input documentation on how to specify the MSA section for protein and RNA sequences.
The A3M-formatted content can be specified either as a path or as a string (mutally exclusive).
af3cli [...] protein [...] msa --paired ... --unpaired ...
af3cli [...] protein [...] msa --pairedpath ... --unpairedpath ...
In the case of the Python API, you must specify whether the respective string is a path.
from af3cli import MSA
msa = MSA(
paired="...", unpaired="...",
paired_is_path=True, unpaired_is_path=True,
)
protein_seq = ProteinSequence(
"<SEQUENCE>",
# [...]
msa=msa
)
# alternative
protein_seq._msa = msa
Ligands and Ions
The ligands are treated in a generally similar way to the sequences and can be defined either as SMILES or with a corresponding CCD identifier. SDF files can also be read and converted to SMILES via an optional RDKit dependency. If there are multiple entries in the SDF, they are added as individual ligands. Ions are simply treated as ligands in AlphaFold3.
af3cli [...] \
- ligand add --smiles "CCC" \
# providing a list of CCD codes is also supported
- ligand add --ccd "MG" \
- ligand add --sdf ligands.sdf
In Python, either the parent class Ligand together with the respective LigandType or alternatively the corresponding child classes CCDLigand or SMILigand can be used to add new ligands.
from af3cli import Ligand, LigandType, SMILigand
from af3cli.ligand import sdf2smiles
ligand = Ligand(
LigandType.SMILES,
"CCC",
#[...]
)
# using SMILigand
# ligand = SMILigand("CCC")
builder.add_ligand(ligand)
# ...
for smi in sdf2smiles("ligands.sdf"):
builder.add_ligand(
Ligand(LigandType.SMILES, smi)
)
Custom CCD
Please refer to the AlphaFold3 input documentation on how to generate valid CCD mmCIF files.
The entire file content is stored as a string in the JSON file and is only stored in a variable here. A plain text file must, therefore, simply be specified for the CLI.
af3cli [...] ccd [--filename] <filename>
In Python, you then have to read the file yourself.
builder.set_user_ccd(filecontent)
Bonds
The bonded atom pairs are defined in the JSON file as a list of lists, each of which contains the Entity ID, the Residue ID and the atom name. To make it as easy as possible to add new bonds, a string format is used, which is then translated into the correct format.
# E: Entity ID; R: Residue ID N: atom name
af3cli [...] bond [--add] "E:R:N-E:R:N"
# example
af3cli [...] bond [--add] "A:1:C-B:1:O"
Although the sequences should be numbered in the order in which they were added, it is advisable to manually assign a sequence ID to the respective entities for the bonds (see below).
Python:
from af3cli import Bond
bond = Bond.from_string("A:1:C-B:1:O")
builder.add_bonded_atom_pair(bond)
You can also use the Atom class to initialize new atoms and create a Bond object from any two atoms.
from af3cli import Bond, Atom
atom_1 = Atom("A", 1, "C")
atom_2 = Atom("B", 1, "O")
bond = Bond(atom_1,atom_2)
builder.add_bonded_atom_pair(bond)
Sequence ID Handling
The IDs for sequences, ligands, and ions are normally assigned automatically and should only be specified manually if it is really necessary, as ID clashes may occur. An IDRegister object keeps track of the sequences used and, if necessary, skips IDs that have already been registered.
One case where it is necessary to specify the IDs manually is for bonds between different entries, as the chain ID must be specified for the bonded atom pairs (see above).
# "A,B,..." | "(A,B,...)" | "[A,B,...]" are valid
af3cli [...] protein add [...] -i "A,B"
If you only want to calculate homomultimers without specifying an explicit ID, you can also specify a number.
af3cli [...] protein add [...] -n 2
# works for all sequence types and ligands/ions
af3cli [...] \
- protein add [...] -n 5 \
- ligand add [...] -n 5
You can also specify IDs or a number in connection with an SDF file, whereby it should be noted that the number of manually specified IDs must correspond to the number of ligands in the SDF file. If a number is specified, all entries in the SDF are then multiplied by this number.
In Python, the number or explicit IDs can be specified when initializing Ligand or Sequence objects. If both parameters are specified, their count must match. The registration or automatic assignment of IDs only takes place in connection with an InputFile object and is carried out when the file is converted into a dictionary (e.g. when the file is written).
ligand = SMILigand(
"CCC",
seq_id=["A", "B"],
num=2
)
Merging Files
Occasionally, it can be helpful to create a base file of your system and prepare subsequent AlphaFold3 jobs by merging existing files with new entries. The merge command is chainable, allowing to combine several files. However, this should be done with caution if certain IDs, bonds, or seeds are important.
af3cli [...] merge [--filename] <filename>
# add new sequences
af3cli [...] merge [--filename] <filename> \
- protein add "MVKLAGST..." \
- ligand add --ccd "MG"
# keep IDs
af3cli [...] merge [--filename] <filename> --noreset
# override/merge special entries
af3cli [...] merge [--filename] <filename> \
# override user-specified CCD data
--userccd \
# merge bonded atoms data
--bonds \
# merge seeds (removes duplicates)
--seeds
Python:
from af3cli import InputFile
input_file = InputFile()
other_input_file = InputFile.read("filename")
input_file.merge(other_input_file)
# with additional parameters
input_file.merge(
other_input_file,
reset=True,
seeds=True,
bonded_atoms=False,
userccd=False
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file af3cli-0.3.0.tar.gz.
File metadata
- Download URL: af3cli-0.3.0.tar.gz
- Upload date:
- Size: 72.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.24
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5d23f59d0235be23a1a8056ab1f8957b0131e0b122d57142ff2abaaaf33bfada
|
|
| MD5 |
ac7b2396adf4ac720a3577ad24fbfddf
|
|
| BLAKE2b-256 |
7ec88ff2b1f81001ba61212db72e0ba20fadda325efdcfd38c1110138acfac72
|
File details
Details for the file af3cli-0.3.0-py3-none-any.whl.
File metadata
- Download URL: af3cli-0.3.0-py3-none-any.whl
- Upload date:
- Size: 34.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.24
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a6b31b28868bf23807558946152bb9ab6bb68d2345116794aeced71206e85c43
|
|
| MD5 |
6801767089e176f54306d8303a5a671e
|
|
| BLAKE2b-256 |
c5d8b72a736358e7fcde1b702571eaca70bed513fe15f57064d11e1bcfdd6cf3
|