FactorForge — open-source constraint-based CDS design engine by Eijex.
Project description
FactorForge
Open-source constraint-based CDS design engine for Nicotiana benthamiana expression workflows.
FactorForge optimizes protein sequences into N. benthamiana-compatible CDS by maximizing CAI, controlling GC content, eliminating PolyA signals, and producing MoClo/Golden Gate-ready constructs.
Quick Start
pip install factorforge-cds
factorforge optimize my_protein.fasta -o output.fasta
Or with Python:
from factorforge.engines.v2.pipeline import OptimizationPipeline
pipeline = OptimizationPipeline(profile="balanced")
result = pipeline.run("MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEG...")
print(result.sequence) # optimized CDS
print(result.metadata) # CAI, GC%, scan results, domestication edits
Access Options
| Method | Description | Link |
|---|---|---|
| Web App | No installation, demo & light use | factorforge.vercel.app |
| CLI / Python | Local use, batch processing, data privacy | pip install factorforge |
| Notebooks | Training & experimentation on Colab / Kaggle | See notebooks/ |
How It Works
FactorForge runs a deterministic constraint-based pipeline in four stages:
Protein sequence (FASTA or plain text)
│
▼
1. Reverse Translation
Selects synonymous codons to maximize CAI
against the N. benthamiana codon usage table
│
▼
2. Rule Scan
Detects PolyA signals, homopolymers,
repeat sequences, forbidden restriction sites
│
▼
3. Domestication
Removes Golden Gate / MoClo-incompatible
BsaI / BsmBI recognition sites via silent edits
│
▼
4. Output
Optimized CDS — FASTA or GenBank
with full metrics and scan report
Optimization Profiles
| Profile | Description |
|---|---|
balanced |
CAI + GC balance (default) |
high_cai |
Maximum codon adaptation |
gc_target |
Target GC 42.5% for N. benthamiana |
viral_delivery |
Adjusted for viral vector delivery |
Performance
Benchmarked on N. benthamiana codon usage table (v2 engine, 3,876 sequences):
| Metric | Value | Target |
|---|---|---|
| CAI (mean) | 0.80 | ≥ 0.75 |
| GC% (mean) | 42.54% | 40–55% |
| GC% (range) | 40.36–53.81% | 40–55% |
| AA identity | 100% | 100% |
| Validator pass rate | 100% | 100% |
Supported Hosts
| Host | Status |
|---|---|
| Nicotiana benthamiana | ✅ Supported |
| Wolffia globosa | 🔶 Codon table available, coming soon |
| Other plant hosts | 📋 Planned |
Installation
Requirements: Python 3.10+
pip install factorforge-cds
For ML features (ESM2 + BART decoder, experimental):
pip install "factorforge-cds[ml]"
For development:
git clone https://github.com/eijex/factorforge.git
cd factorforge
pip install -e ".[dev]"
CLI Reference
# Basic optimization (DP feasibility engine, default)
factorforge optimize input.fasta -o output.fasta
# Rule-based engine with profile
factorforge optimize input.fasta -e v2 -p balanced -o output.fasta
# With MoClo construct template, GenBank output
factorforge optimize input.fasta -e v2 -p balanced \
--template standard_expression -o output.gb --format genbank
# Custom GC target range
factorforge optimize input.fasta --gc-min 40 --gc-max 50 -o output.fasta
# List available engines
factorforge list-engines
Key options:
| Option | Default | Description |
|---|---|---|
--engine, -e |
dp |
Engine: dp (feasibility) or v2 (rule-based) |
--profile, -p |
balanced |
Optimization profile |
--objective |
feasibility_best |
DP objective |
--gc-min / --gc-max |
40 / 55 | GC% target range |
--format |
fasta or genbank |
Output format |
--scan-mode |
full |
Rule scan: full or fast |
Output
Each optimized sequence includes:
- Optimized CDS — synonymous codon replacements only, AA identity 100%
- CAI score — codon adaptation index for N. benthamiana
- GC content — global and first-region
- Scan report — PolyA signals detected/fixed, homopolymers, restriction sites
- Domestication report — BsaI/BsmBI sites removed, edit count
- Construct ID — reproducible hash for tracking
⚠️ Validation Status
FactorForge predictions are in-silico only and have not been experimentally validated in wet-lab conditions.
We are actively seeking researchers to test these predictions. If you use FactorForge in your experiments, we'd love to hear from you:
- Did the optimized sequence express well?
- How did CAI / GC% correlate with actual expression levels?
- Any unexpected results?
Share your results → GitHub Issues or email: eijex.lab@gmail.com
Validated results will be credited in VALIDATION.md and future releases.
🛠️ Developed With
This project was built using the following tools and platforms:
| Tool | Role |
|---|---|
| Claude / Claude Code (Anthropic) | Architecture design, domain analysis, code review |
| Codex (OpenAI) | Code generation and implementation |
| Google Colab | ML training (Run 1, Run 2) |
| Kaggle | ML training (alpha_run1, alpha_run2) |
| ESM2 (Meta) | Protein language model (encoder) |
| PyTorch | ML framework |
| Conda / Miniconda | Environment management |
| Vercel | Web deployment |
| GitHub | Version control and open-source hosting |
| HuggingFace | Model ecosystem |
| BioPython | Biological sequence processing |
Citing
If you use FactorForge in your research, please cite:
FactorForge v3.0.0 (2026). Open-source constraint-based CDS design engine.
Eijex. https://github.com/eijex/factorforge
A citable publication is in preparation. Until then, please cite the GitHub repository.
License & Disclaimer
FactorForge source code is licensed under the Apache License 2.0.
Disclaimer: FactorForge is provided for research purposes only. Predictions are computational and have not been experimentally validated. The authors make no warranties regarding expression outcomes in wet-lab settings. Use at your own discretion.
Get in Touch
- GitHub Issues — bug reports, feature requests, wet-lab results: github.com/eijex/factorforge/issues
- Email — collaborations, feedback, questions: eijex.lab@gmail.com
- Web — factorforge.vercel.app
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file factorforge_cds-3.0.0.tar.gz.
File metadata
- Download URL: factorforge_cds-3.0.0.tar.gz
- Upload date:
- Size: 81.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a7d6fc73c9f3e8fe3606fe4ab2c4cfaaf787a36cb882a428071cfe43903d67e
|
|
| MD5 |
9a6c857d2d6bea24d53a4ccca9cbe69d
|
|
| BLAKE2b-256 |
8b8c65e109c290a19d27f7677f1fd12737327aa11f4bff1c4356238f4f2db322
|
File details
Details for the file factorforge_cds-3.0.0-py3-none-any.whl.
File metadata
- Download URL: factorforge_cds-3.0.0-py3-none-any.whl
- Upload date:
- Size: 93.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b2e661133cb049ac641e19c14399d8beb91686a4067523b3c891697e9a765f7e
|
|
| MD5 |
e36d55e44fff5bff9d0b76b62e08ba30
|
|
| BLAKE2b-256 |
3e4f12e5513b8d93d6496b719bde04e77150c7d9642da0b483becb5bebd33d87
|