Skip to main content

A command-line interface and Python library for generating AlphaFold3 input files.

Project description

af3cli

A command-line interface and Python library for generating AlphaFold3 input files.

Installation

We recommend using uv to manage your installation.

uv sync --locked

This automatically creates a virtual environment .venv in the project folder and installs all dependencies. If you do not need the optional dependencies for reading SDF (RDKit) or FASTA files (Biopython), the installation can be prevented with --no-group features.

Basic Usage

The generation of AlphaFold3 input files can be done either with the standalone CLI tool or for more advanced tasks by using the library in Python scripts. Python Fire is used to implement the CLI application.

For a detailed overview of all available JSON fields check the AlphaFold3 input documentation.

[!WARNING] In most cases, checks are only carried out to ensure a correct structure of the input file, but not whether the inputs themselves are valid.

You can display the help and overview of the available CLI commands with the following statement.

af3cli -- --help

All commands and sub-commands are separated with - to enable chaining, allowing e.g. several sequences, ligands, bonds etc. to be added.

af3cli toplevel sub [...] - sub [...] \
    - toplevel sub [...]

You can use the debug command to display the final file without writing it.

af3cli debug --show - [...]

Config Parameters

The config command is used to manage basic settings, such as the file name of the JSON file to be written, the name of the job or the respective version.

af3cli config -f "filename.json" -j "jobname" -v 2

The library provides an InputBuilder that allows new jobs to be created very comfortably step by step.

from af3cli import InputBuilder

builder = InputBuilder()
builder.set_name("jobname")
builder.set_version(2)
# builder.set_dialect("alphafold3") # default

input_file = builder.build()
input_file.write("filename.json")

You can also initialize the InputBuilder with an existing InputFile object in order to add further sequences or ligands or to change settings.

Random Seeds

It is required that at least one random seed is specified. The default value is therefore 1. Otherwise, you can either specify a number of values to generate a list of random seeds or pass a list of integers yourself.

af3cli seeds -n 10 - ...
# generates 10 random numbers

af3cli seeds -v "1,2,3"
# "(1,2,3,...)" or "[1,2,3,...]" are also valid

Python:

builder.set_seeds([1, 2, 3])

Sequences

Adding sequences works basically the same for all three available types, but not all JSON fields are supported for each type. The corresponding subcommands therefore differ in some cases.

af3cli [...] \
    - protein add "MVKLAGST" \ # positional argument
    - protein add --sequence "AAQAA" \
    - dna add --sequence "AATTTTCC" \
    - rna add --sequence "UUUGGCCGG"

A check is performed to ensure that the sequence characters match the respective type in the CLI application or in the Python library itself, when the Sequence object is converted into a dictionary.

from af3cli import Sequence, SequenceType

# SequenceType.DNA / SequenceType.RNA
protein_seq = Sequence(
    SequenceType.PROTEIN,
    "MVKLAGST...",
    #[...]
)

builder.add_sequence(protein_seq)

FASTA Files

As it is often not very practical to add many or particularly long sequences via the CLI, it is possible to read the respective sequence from a FASTA file. To use this feature, Biopython must be installed as an optional dependency.

af3cli protein add --sequence <filename> --fasta

Each sequence command expects exactly one single sequence. Otherwise it is not possible to add additional fields, such as modifications or templates. However, it is still possible to read several sequences from a FASTA file if the additional features are not required. For even more advanced tasks, the Python API must otherwise be used.

af3cli [...] - fasta [--filename] <filename>

The respective sequence type is automatically detected, which is not possible in rare cases. If this is the case, the sequence is ignored and a warning is issued. It is, therefore, advisable to add all sequences whose type cannot be clearly identified separately via the sequence commands.

There are also two ways of doing this when using Python. The fasta2seq function can be used to obtain a generator that automatically creates Sequence objects and the read_fasta function is used to create a generator that returns the plain FASTA IDs and sequences from the FASTA file as a string.

from af3cli.sequence import fasta2seq, read_fasta

for seq in fasta2seq(filename):
    ...
    # do something with the Sequence object

for fasta_id, seq_str in read_fasta(filename):
    ...
    # create your own Sequence objects

Modifications

By applying the modification subcommand, any number of modifications can be added to the sequences with the respective CCD identifier and position. The different fields in the JSON file are automatically inserted correctly based on the sequence type.

# as positional arguments
af3cli [...] protein [...] - modification "SEP" 5
# or with explicit argument names
af3cli [...] dna [...] - modification --mod "6OG" --pos 1

When using the Python API, you have to explicitly define what kind of modifications you would like to add, since the resulting JSON fields are different for protein (ResidueModification) or nucleotide sequences (NucleotideModification). Please note that checks are performed to verify the modification types when the Sequence object is converted to a dictionary.

from af3cli import (Sequence, SequenceType,
                    ResidueModification,
                    NucleotideModification)

rmod = ResidueModification("SEP", 5)

protein_seq = Sequence(
    SequenceType.PROTEIN,
    "<SEQUENCE>",
    # [...]
    modifications=[rmod]
)

# it is possible to add more modifications later
protein_seq.modifications.append(rmod)

Structural Templates

For protein sequences, it is possible to specify multiple structural templates in mmCIF format as a string or path. Since it is completely impractical to use strings via the CLI tool, the file must be submitted as plain text and is then read in its entirety as a string.

# read the file as string with the '--read' flag
af3cli [...] protein [...] - template [--mmcif] <filename> --read
# keep relative/absolute path
af3cli [...] protein [...] - template [--mmcif] <filename>

# specify query and template indices as list of integers
# "1,2,3,..." | "(1,2,3,...)" | "[1,2,3,...]" are valid
af3cli [...] protein [...] \
    - template [--mmcif] <filename> -q "..." -t "..."

As it makes no difference to Python whether the string contains a path to a file or the file content, all you need to do is specify the template type. The file must then be read manually beforehand if a string is desired in the JSON file.

from af3cli import Template, TemplateType

# TemplateType.FILE for relative/absolute path
t = Template(
    TemplateType.STRING,
    "mmCIF content",
    qidx=[], tidx=[]
)

protein_seq = Sequence(
    SequenceType.PROTEIN,
    "<SEQUENCE>",
    # [...]
    templates=[t]
)

# it is possible to add more templates later
protein_seq.templates.append(t)

Multiple Sequence Alignment

Please refer to the AlphaFold3 input documentation on how to specify the MSA section for protein and RNA sequences.

The A3M-formatted content can be specified either as a path or as a string (mutally exclusive).

af3cli [...] protein [...] msa --paired ... --unpaired ...
af3cli [...] protein [...] msa --pairedpath ... --unpairedpath ...

In the case of the Python API, you must specify whether the respective string is a path.

from af3cli import MSA

msa = MSA(
    paired="...", unpaired="...",
    paired_is_path=True, unpaired_is_path=True,
)

protein_seq = Sequence(
    SequenceType.PROTEIN,
    "<SEQUENCE>",
    #[...]
    msa=msa
)

# alternative
protein_seq.msa = msa

Ligands and Ions

The ligands are treated in a generally similar way to the sequences and can be defined either as SMILES or with a corresponding CCD identifier. SDF files can also be read and converted to SMILES via an optional RDKit dependency. If there are multiple entries in the SDF, they are added as individual ligands. Ions are simply treated as ligands in AlphaFold3.

af3cli [...] \
    - ligand add --smiles "CCC" \
    # providing a list of CCD codes is also supported
    - ligand add --ccd "MG" \
    - ligand add --sdf ligands.sdf

Python:

from af3cli import Ligand, LigandType
from af3cli.ligand import sdf2smiles

ligand = Ligand(
    LigandType.SMILES,
    "CCC",
    #[...]
)

builder.add_ligand(ligand)

# ...
for smi in sdf2smiles("ligands.sdf"):
    builder.add_ligand(
        Ligand(LigandType.SMILES, smi)
    )

Custom CCD

Please refer to the AlphaFold3 input documentation on how to generate valid CCD mmCIF files.

The entire file content is stored as a string in the JSON file and is only stored in a variable here. A plain text file must, therefore, simply be specified for the CLI.

af3cli [...] ccd [--filename] <filename>

In Python, you then have to read the file yourself.

builder.set_user_ccd(filecontent)

Bonds

The bonded atom pairs are defined in the JSON file as a list of lists, each of which contains the Entity ID, the Residue ID and the atom name. To make it as easy as possible to add new bonds, a string format is used, which is then translated into the correct format.

# E: Entity ID; R: Residue ID N: atom name
af3cli [...] bond [--add] "E:R:N-E:R:N"

# example
af3cli [...] bond [--add] "A:1:C-B:1:O"

Although the sequences should be numbered in the order in which they were added, it is advisable to manually assign a sequence ID to the respective entities for the bonds (see below).

Python:

from af3cli import Bond

bond = Bond.from_string("A:1:C-B:1:O")

builder.add_bonded_atom_pair(bond)

You can also use the Atom class to initialize new atoms and create a Bond object from any two atoms.

from af3cli import Bond, Atom

atom_1 = Atom("A", 1, "C")
atom_2 = Atom("B", 1, "O")
bond = Bond(atom_1,atom_2)

builder.add_bonded_atom_pair(bond)

Sequence ID Handling

The IDs for sequences, ligands, and ions are normally assigned automatically and should only be specified manually if it is really necessary, as ID clashes may occur. An IDRegister object keeps track of the sequences used and, if necessary, skips IDs that have already been registered.

One case where it is necessary to specify the IDs manually is for bonds between different entries, as the chain ID must be specified for the bonded atom pairs (see above).

# "A,B,..." | "(A,B,...)" | "[A,B,...]" are valid
af3cli [...] protein add [...] -i "A,B"

If you only want to calculate homomultimers without specifying an explicit ID, you can also specify a number.

af3cli [...] protein add [...] -n 2

# works for all sequence types and ligands/ions
af3cli [...] \
    - protein add [...] -n 5 \
    - ligand add [...] -n 5

You can also specify IDs or a number in connection with an SDF file, whereby it should be noted that the number of manually specified IDs must correspond to the number of ligands in the SDF file. If a number is specified, all entries in the SDF are then multiplied by this number.

In Python, the number or explicit IDs can be specified when initializing Ligand or Sequence objects. If both parameters are specified, their count must match. The registration or automatic assignment of IDs only takes place in connection with an InputFile object and is carried out when the file is converted into a dictionary (e.g. when the file is written).

ligand = Ligand(
    LigandType.SMILES,
    "CCC",
    seq_id=["A", "B"],
    num=2
)

Merging Files

Occasionally, it can be helpful to create a base file of your system and prepare subsequent AlphaFold3 jobs by merging existing files with new entries. The merge command is chainable, allowing to combine several files. However, this should be done with caution if certain IDs, bonds, or seeds are important.

af3cli [...] merge [--filename] <filename>

# add new sequences
af3cli [...] merge [--filename] <filename> \
    - protein add "MVKLAGST..." \
    - ligand add --ccd "MG"

# keep IDs
af3cli [...] merge [--filename] <filename> --noreset

# override/merge special entries
af3cli [...] merge [--filename] <filename> \
    # override user-specified CCD data
    --userccd \
    # merge bonded atoms data
    --bonds \
    # merge seeds (removes duplicates)
    --seeds

Python:

from af3cli import InputFile

input_file = InputFile()
other_input_file = InputFile.read("filename")

input_file.merge(other_input_file)

# with additional parameters
input_file.merge(
    other_input_file,
    reset=True,
    seeds=True,
    bonded_atoms=False,
    userccd=False
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

af3cli-0.2.0.tar.gz (70.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

af3cli-0.2.0-py3-none-any.whl (32.7 kB view details)

Uploaded Python 3

File details

Details for the file af3cli-0.2.0.tar.gz.

File metadata

  • Download URL: af3cli-0.2.0.tar.gz
  • Upload date:
  • Size: 70.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.11

File hashes

Hashes for af3cli-0.2.0.tar.gz
Algorithm Hash digest
SHA256 8b9e93d57abee90533259f52f64d0eedb418ead74df7cb447a49533c44c107e6
MD5 ee0a95b670a1da7816b796b4c80aa533
BLAKE2b-256 1753057828165ef10ba1d3d2e009e3748022497d986c8971cf42e94c42d65ec9

See more details on using hashes here.

File details

Details for the file af3cli-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: af3cli-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 32.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.11

File hashes

Hashes for af3cli-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 10a90c6fdc6004577b960edb47ae2bfc73c4be7d7623daf6b48198cfe77eb7f5
MD5 5ea44456ef38c5f47f7c86f85e6a59e2
BLAKE2b-256 55f26bce5be854eeee37801c9c1387be157e92604f9b2160848e3b2cd0b25a32

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page