Skip to main content

FactorForge — open-source constraint-based CDS design engine by Eijex.

Project description

FactorForge

Open-source constraint-based CDS design engine for Nicotiana benthamiana expression workflows.

License Python Version Web App

FactorForge optimizes protein sequences into N. benthamiana-compatible CDS by maximizing CAI, controlling GC content, eliminating PolyA signals, and producing MoClo/Golden Gate-ready constructs.


Quick Start

pip install factorforge-cds
factorforge optimize my_protein.fasta -o output.fasta

Or with Python:

from factorforge.engines.v2.pipeline import OptimizationPipeline

pipeline = OptimizationPipeline(profile="balanced")
result = pipeline.run("MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEG...")
print(result.sequence)   # optimized CDS
print(result.metadata)   # CAI, GC%, scan results, domestication edits

Access Options

Method Description Link
Web App No installation, demo & light use factorforge-cds.vercel.app
CLI / Python Local use, batch processing, data privacy pip install factorforge-cds
Notebooks Training & experimentation on Colab / Kaggle See notebooks/

How It Works

FactorForge runs a deterministic constraint-based pipeline in four stages:

Protein sequence (FASTA or plain text)
        │
        ▼
1. Reverse Translation
   Selects synonymous codons to maximize CAI
   against the N. benthamiana codon usage table
        │
        ▼
2. Rule Scan
   Detects PolyA signals, homopolymers,
   CpG/TpA dinucleotide hotspots,
   repeat sequences, rare codon runs,
   forbidden restriction sites
        │
        ▼
3. Domestication
   Removes Golden Gate / MoClo-incompatible
   BsaI / BsmBI recognition sites via silent edits
   Optional custom restriction sites can be removed
   by synonymous substitution when feasible
   CpG/TpA reduction uses a CAI-budgeted balanced mode
        │
        ▼
4. Output
   Optimized CDS — FASTA or GenBank
   with full metrics and scan report

Optimization Profiles

Profile Description
balanced CAI + GC balance (default)
high_cai Maximum codon adaptation
gc_target Target GC 42.5% for N. benthamiana
viral_delivery Adjusted for viral vector delivery

Performance

Benchmarked on N. benthamiana codon usage table (v2 engine, 3,876 sequences):

Metric Value Target
CAI (mean) 0.80 ≥ 0.75
GC% (mean) 42.54% 40–55%
GC% (range) 40.36–53.81% 40–55%
AA identity 100% 100%
Validator pass rate 100% 100%

Supported Hosts

Host Status
Nicotiana benthamiana ✅ Supported
Wolffia globosa 🔶 Codon table available, coming soon
Other plant hosts 📋 Planned

Installation

Requirements: Python 3.10+

pip install factorforge-cds

Experimental ML research modules are available separately:

pip install "factorforge-cds[ml]"

These modules (ESM2 + BART decoder) are not part of the stable v3.1.0 default optimizer. The default v3.1.0 engine is the constraint-based DP feasibility engine.

For development:

git clone https://github.com/eijex/factorforge-cds.git
cd factorforge
pip install -e ".[dev]"

Docker (local web app)

Run the full web interface locally — no data leaves your machine:

docker pull ghcr.io/eijex/factorforge-cds:latest
docker run -p 8080:8080 ghcr.io/eijex/factorforge-cds:latest

Then open http://localhost:8080.

Or build from source:

git clone https://github.com/eijex/factorforge-cds.git
cd factorforge
docker build -t factorforge-cds .
docker run -p 8080:8080 factorforge-cds

Updating

PyPI (pip install):

pip install --upgrade factorforge-cds

Docker:

docker pull ghcr.io/eijex/factorforge-cds:latest

Git clone / local development:

git pull origin main
pip install -e ".[dev]"

To check your installed version:

pip show factorforge-cds
# or
factorforge --version

Release notes for each version are in CHANGELOG.md.


CLI Reference

# Basic optimization (DP feasibility engine, default)
factorforge optimize input.fasta -o output.fasta

# Rule-based engine with profile
factorforge optimize input.fasta -e v2 -p balanced -o output.fasta

# With MoClo construct template, GenBank output
factorforge optimize input.fasta -e v2 -p balanced \
  --template standard_expression -o output.gb --format genbank

# Custom GC target range
factorforge optimize input.fasta --gc-min 40 --gc-max 50 -o output.fasta

# List available engines
factorforge list-engines

Key options:

Option Default Description
--engine, -e dp Engine: dp (feasibility) or v2 (rule-based)
--profile, -p balanced Optimization profile
--objective feasibility_best DP objective
--gc-min / --gc-max 40 / 55 GC% target range
--format fasta or genbank Output format
--scan-mode full Rule scan: full or fast

Output

Each optimized sequence includes:

  • Optimized CDS — synonymous codon replacements only, AA identity 100%
  • CAI score — codon adaptation index for N. benthamiana
  • GC content — global and first-region
  • Scan report — PolyA signals detected/fixed, CpG/TpA hotspots, homopolymers, rare codon runs, restriction sites
  • Domestication report — BsaI/BsmBI and optional custom restriction sites removed, edit count
  • Construct ID — reproducible hash for tracking

⚠️ Validation Status

FactorForge predictions are in-silico only and have not been experimentally validated in wet-lab conditions.

We are actively seeking researchers to test these predictions. If you use FactorForge in your experiments, we'd love to hear from you:

  • Did the optimized sequence express well?
  • How did CAI / GC% correlate with actual expression levels?
  • Any unexpected results?

Share your resultsGitHub Issues or email: eijex.lab@gmail.com

Validated results will be credited in VALIDATION.md and future releases.


🛠️ Developed With

This project was built using the following tools and platforms:

Tool Role
Claude / Claude Code (Anthropic) Architecture design, domain analysis, code review
Codex (OpenAI) Code generation and implementation
Google Colab ML training experiments
Kaggle ML training experiments
ESM2 (Meta) Protein language model (encoder)
PyTorch ML framework
Conda / Miniconda Environment management
Vercel Web deployment
GitHub Version control and open-source hosting
HuggingFace Model ecosystem
BioPython Biological sequence processing

Citing

If you use FactorForge in your research, please cite:

FactorForge v3.1.0 (2026). Open-source constraint-based CDS design engine.
Eijex. https://github.com/eijex/factorforge-cds

A citable publication is in preparation. Until then, please cite the GitHub repository.


License & Disclaimer

FactorForge source code is licensed under the Apache License 2.0.

Disclaimer: FactorForge is provided for research purposes only. Predictions are computational and have not been experimentally validated. The authors make no warranties regarding expression outcomes in wet-lab settings. Use at your own discretion.


Get in Touch

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

factorforge_cds-3.1.0.tar.gz (89.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

factorforge_cds-3.1.0-py3-none-any.whl (99.6 kB view details)

Uploaded Python 3

File details

Details for the file factorforge_cds-3.1.0.tar.gz.

File metadata

  • Download URL: factorforge_cds-3.1.0.tar.gz
  • Upload date:
  • Size: 89.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for factorforge_cds-3.1.0.tar.gz
Algorithm Hash digest
SHA256 e7273adb9bf2aaeed0ad63533b024406c246c0d2145c2a3b2fb9dc6322e15d7e
MD5 820062f33789707dafdee69e1fde907a
BLAKE2b-256 7ca679d5c3421a034a3258bbf6eda923ec3a0fec209b8dd82a1212f84119df55

See more details on using hashes here.

File details

Details for the file factorforge_cds-3.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for factorforge_cds-3.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a523a5eeea205ea9abb53e6e2d8824013960ec40f754949907820359d6d8380a
MD5 0af2d27282b1d978fc9d466a0ebea859
BLAKE2b-256 61e0ae0238f322913fc31af113d30e3fcac319024d7550d907aef364463b05c3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page