A high level programming language for generative biology
Project description
Proto Language
Welcome! This repository contains the open-source implementation of proto-language, a Python package for designing biological sequences (DNA, RNA, and proteins) through constraint-based optimization. A design is specified as a set of constraints, and the framework runs a propose–score–refine loop to search for sequences that satisfy them, drawing on a large suite of computational biology and biological AI tools to score candidates.
proto-language is built on top of the proto-tools execution layer, so each computationally intensive tool (structure predictors, protein language models, inverse folding, sequence and structure aligners, gene annotation, and more) runs in its own automatically managed, isolated environment. Programs can run locally or as hosted optimization runs through the proto-client Python SDK.
Proto-language is open source under an MIT license. Contributions are welcome!
Setup
Step 1: Install the package
The package requires Python 3.10 or later and pip:
pip install git+https://github.com/evo-design/proto-language.git
System tools that standalone tool environments require in order to build (git, curl, gcc, make, cmake) are automatically provisioned on first use through proto-tools' shared foundation environment, so no manual setup is necessary.
[!NOTE] A direct PyPI install (
pip install proto-language) is planned.
[!NOTE] Contributors should instead use the editable installation described in CONTRIBUTING.md.
Step 2: Configure storage (optional)
All persistent data (model weights, tool environments, micromamba) is stored under PROTO_HOME, which defaults to ~/.proto/ and is inherited from proto-tools.
To customize the storage location (recommended for laboratory and HPC environments):
# Add to your shell profile:
export PROTO_HOME=/path/to/your/proto_home
To override only the model-weights location, set export PROTO_MODEL_CACHE=/path/to/shared/weights. See notes/filesystem.md for all options.
Step 3: Gated model access (optional)
Some generators and constraints load gated models (for example ESM3, AlphaGenome, and AlphaFold3) that require accepting a license and authenticating with HuggingFace. Set HF_TOKEN in the environment after accepting each model's terms. See proto-tools/README.md for the full procedure and the list of gated models.
[!TIP] Setup is complete. See the Quickstart to run a program from end to end.
Quickstart
Working programs are provided under examples/:
examples/scripts/— runnable Python programs, ranging from a minimal end-to-end example (toy.py) to broader workloads.examples/jsons/— declarative JSON program definitions (theoptimization_stagesschema). These illustrate program structure and are not loaded by a Python consumer.
Architecture
The framework is built around seven primitives in proto_language/core/ — three data containers, three pluggable interfaces, and one orchestrator:
Sequence— a typed string (DNA, RNA, or protein) together with optional logits, a folded structure, and namespaced metadata. The atomic unit of design.Segment— a single design region. It holds the proposalSequences for that region and the surviving resultSequences after scoring.Construct— an ordered list ofSegments that concatenate into a full biological construct (for example, a promoter plus a coding region; a multi-chain protein; or a designed gene).Constraint(registered via@constraint) — scores aSequenceagainst a target property, returning a score and namespaced metadata, and may optionally provide gradients.Generator(registered via@generator) — proposes newSequences for aSegment.Optimizer(registered via@optimizer) — a search strategy that drives the propose–score–refine loop.Program— the top-level orchestrator. It owns theConstructand composes one or moreOptimizerstages.
All three pluggable interfaces share a BaseConfig Pydantic configuration pattern and declare parameters with ConfigField.
The optimization loop
Program.run() iterates through its optimizer stages. Each stage performs the following steps:
- The
Optimizerrequests proposalSequences from itsGeneratorfor one or moreSegments. - Each
Constraintevaluates the proposals and records its score and metadata on the proposalSequences. - The
Optimizeraggregates the constraint scores and selects survivors. These become theSegment's resultSequences and feed into the next iteration, or the next stage.
When the program finishes, Program.export(path=...) writes a directory containing tables for sequences, constraints, constructs, and optimization steps, a FASTA file, and an assets/ sidecar directory.
Development & Contributing
See CONTRIBUTING.md for developer setup, code style, testing, and agent conventions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file proto_language-0.1.0.tar.gz.
File metadata
- Download URL: proto_language-0.1.0.tar.gz
- Upload date:
- Size: 391.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
68a9b3739b3eb2f6995b476846a228d458cdd70e37bf52386ed4d57b683d4bba
|
|
| MD5 |
e462da4285e7da58cd2ad3a1ef773203
|
|
| BLAKE2b-256 |
a633a50f4cd94b7669e8ef3197e808c7745161987a417063df6917331d3a0fec
|
File details
Details for the file proto_language-0.1.0-py3-none-any.whl.
File metadata
- Download URL: proto_language-0.1.0-py3-none-any.whl
- Upload date:
- Size: 488.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab67615483e372e36fc650f3064f033403da1faa8e84fb97789d366a0b207897
|
|
| MD5 |
febf2d144b28ad2d41766a316ba3fd9d
|
|
| BLAKE2b-256 |
5823407988f0c7471e4e5607504025c7dd2e9abadb76c644c5a689635d430975
|