FactorForge — open-source constraint-based CDS design engine by Eijex.
Project description
FactorForge
Open-source constraint-based CDS design engine for Nicotiana benthamiana expression workflows.
FactorForge optimizes protein sequences into N. benthamiana-compatible CDS by maximizing CAI, controlling GC content, eliminating PolyA signals, and producing MoClo/Golden Gate-ready constructs.
Quick Start
pip install factorforge-cds
factorforge optimize my_protein.fasta -o output.fasta
Or with Python:
from factorforge.engines.v2.pipeline import OptimizationPipeline
pipeline = OptimizationPipeline(profile="balanced")
result = pipeline.run("MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEG...")
print(result.sequence) # optimized CDS
print(result.metadata) # CAI, GC%, scan results, domestication edits
Access Options
| Method | Description | Link |
|---|---|---|
| Web App | No installation, demo & light use | factorforge-cds.vercel.app |
| CLI / Python | Local use, batch processing, data privacy | pip install factorforge-cds |
| Notebooks | Training & experimentation on Colab / Kaggle | See notebooks/ |
How It Works
FactorForge runs a deterministic constraint-based pipeline in four stages:
Protein sequence (FASTA or plain text)
│
▼
1. Reverse Translation
Selects synonymous codons to maximize CAI
against the N. benthamiana codon usage table
│
▼
2. Rule Scan
Detects PolyA signals, homopolymers,
CpG/TpA dinucleotide hotspots,
repeat sequences, rare codon runs,
forbidden restriction sites
│
▼
3. Domestication
Removes Golden Gate / MoClo-incompatible
BsaI / BsmBI recognition sites via silent edits
Optional custom restriction sites can be removed
by synonymous substitution when feasible
CpG/TpA reduction uses a CAI-budgeted balanced mode
│
▼
4. Output
Optimized CDS — FASTA or GenBank
with full metrics and scan report
Optimization Profiles
| Profile | Description |
|---|---|
balanced |
CAI + GC balance (default) |
high_cai |
Maximum codon adaptation |
gc_target |
Target GC 42.5% for N. benthamiana |
viral_delivery |
Adjusted for viral vector delivery |
Performance
Benchmarked on N. benthamiana codon usage table (v2 engine, 3,876 sequences):
| Metric | Value | Target |
|---|---|---|
| CAI (mean) | 0.80 | ≥ 0.75 |
| GC% (mean) | 42.54% | 40–55% |
| GC% (range) | 40.36–53.81% | 40–55% |
| AA identity | 100% | 100% |
| Validator pass rate | 100% | 100% |
Supported Hosts
| Host | Status |
|---|---|
| Nicotiana benthamiana | ✅ Supported |
| Wolffia globosa | 🔶 Codon table available, coming soon |
| Other plant hosts | 📋 Planned |
Installation
Requirements: Python 3.10+
pip install factorforge-cds
Experimental ML research modules are available separately:
pip install "factorforge-cds[ml]"
These modules (ESM2 + BART decoder) are not part of the stable v3.1.0 default optimizer. The default v3.1.0 engine is the constraint-based DP feasibility engine.
For development:
git clone https://github.com/eijex/factorforge-cds.git
cd factorforge
pip install -e ".[dev]"
Docker (local web app)
Run the full web interface locally — no data leaves your machine:
docker pull ghcr.io/eijex/factorforge-cds:latest
docker run -p 8080:8080 ghcr.io/eijex/factorforge-cds:latest
Then open http://localhost:8080.
Or build from source:
git clone https://github.com/eijex/factorforge-cds.git
cd factorforge
docker build -t factorforge-cds .
docker run -p 8080:8080 factorforge-cds
Updating
PyPI (pip install):
pip install --upgrade factorforge-cds
Docker:
docker pull ghcr.io/eijex/factorforge-cds:latest
Git clone / local development:
git pull origin main
pip install -e ".[dev]"
To check your installed version:
pip show factorforge-cds
# or
factorforge --version
Release notes for each version are in CHANGELOG.md.
CLI Reference
# Basic optimization (DP feasibility engine, default)
factorforge optimize input.fasta -o output.fasta
# Rule-based engine with profile
factorforge optimize input.fasta -e v2 -p balanced -o output.fasta
# With MoClo construct template, GenBank output
factorforge optimize input.fasta -e v2 -p balanced \
--template standard_expression -o output.gb --format genbank
# Custom GC target range
factorforge optimize input.fasta --gc-min 40 --gc-max 50 -o output.fasta
# List available engines
factorforge list-engines
Key options:
| Option | Default | Description |
|---|---|---|
--engine, -e |
dp |
Engine: dp (feasibility) or v2 (rule-based) |
--profile, -p |
balanced |
Optimization profile |
--objective |
feasibility_best |
DP objective |
--gc-min / --gc-max |
40 / 55 | GC% target range |
--format |
fasta or genbank |
Output format |
--scan-mode |
full |
Rule scan: full or fast |
Output
Each optimized sequence includes:
- Optimized CDS — synonymous codon replacements only, AA identity 100%
- CAI score — codon adaptation index for N. benthamiana
- GC content — global and first-region
- Scan report — PolyA signals detected/fixed, CpG/TpA hotspots, homopolymers, rare codon runs, restriction sites
- Domestication report — BsaI/BsmBI and optional custom restriction sites removed, edit count
- Construct ID — reproducible hash for tracking
⚠️ Validation Status
FactorForge predictions are in-silico only and have not been experimentally validated in wet-lab conditions.
We are actively seeking researchers to test these predictions. If you use FactorForge in your experiments, we'd love to hear from you:
- Did the optimized sequence express well?
- How did CAI / GC% correlate with actual expression levels?
- Any unexpected results?
Share your results → GitHub Issues or email: eijex.lab@gmail.com
Validated results will be credited in VALIDATION.md and future releases.
🛠️ Developed With
This project was built using the following tools and platforms:
| Tool | Role |
|---|---|
| Claude / Claude Code (Anthropic) | Architecture design, domain analysis, code review |
| Codex (OpenAI) | Code generation and implementation |
| Google Colab | ML training experiments |
| Kaggle | ML training experiments |
| ESM2 (Meta) | Protein language model (encoder) |
| PyTorch | ML framework |
| Conda / Miniconda | Environment management |
| Vercel | Web deployment |
| GitHub | Version control and open-source hosting |
| HuggingFace | Model ecosystem |
| BioPython | Biological sequence processing |
Citing
If you use FactorForge in your research, please cite:
FactorForge v3.1.0 (2026). Open-source constraint-based CDS design engine.
Eijex. https://github.com/eijex/factorforge-cds
A citable publication is in preparation. Until then, please cite the GitHub repository.
License & Disclaimer
FactorForge source code is licensed under the Apache License 2.0.
Disclaimer: FactorForge is provided for research purposes only. Predictions are computational and have not been experimentally validated. The authors make no warranties regarding expression outcomes in wet-lab settings. Use at your own discretion.
Get in Touch
- GitHub Issues — bug reports, feature requests, wet-lab results: github.com/eijex/factorforge-cds/issues
- Email — collaborations, feedback, questions: eijex.lab@gmail.com
- Web — factorforge-cds.vercel.app
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file factorforge_cds-3.1.0.tar.gz.
File metadata
- Download URL: factorforge_cds-3.1.0.tar.gz
- Upload date:
- Size: 89.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7273adb9bf2aaeed0ad63533b024406c246c0d2145c2a3b2fb9dc6322e15d7e
|
|
| MD5 |
820062f33789707dafdee69e1fde907a
|
|
| BLAKE2b-256 |
7ca679d5c3421a034a3258bbf6eda923ec3a0fec209b8dd82a1212f84119df55
|
File details
Details for the file factorforge_cds-3.1.0-py3-none-any.whl.
File metadata
- Download URL: factorforge_cds-3.1.0-py3-none-any.whl
- Upload date:
- Size: 99.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a523a5eeea205ea9abb53e6e2d8824013960ec40f754949907820359d6d8380a
|
|
| MD5 |
0af2d27282b1d978fc9d466a0ebea859
|
|
| BLAKE2b-256 |
61e0ae0238f322913fc31af113d30e3fcac319024d7550d907aef364463b05c3
|