Skip to main content

FactorForge — open-source constraint-based CDS design engine by Eijex.

Project description

FactorForge

Open-source constraint-based CDS design engine for Nicotiana benthamiana expression workflows.

License Python Version Web App

FactorForge optimizes protein sequences into N. benthamiana-compatible CDS by maximizing CAI, controlling GC content, eliminating PolyA signals, and producing MoClo/Golden Gate-ready constructs.


Quick Start

pip install factorforge-cds
factorforge optimize my_protein.fasta -o output.fasta

Or with Python:

from factorforge.engines.v2.pipeline import OptimizationPipeline

pipeline = OptimizationPipeline(profile="balanced")
result = pipeline.run("MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEG...")
print(result.sequence)   # optimized CDS
print(result.metadata)   # CAI, GC%, scan results, domestication edits

Access Options

Method Description Link
Web App No installation, demo & light use factorforge.vercel.app
CLI / Python Local use, batch processing, data privacy pip install factorforge
Notebooks Training & experimentation on Colab / Kaggle See notebooks/

How It Works

FactorForge runs a deterministic constraint-based pipeline in four stages:

Protein sequence (FASTA or plain text)
        │
        ▼
1. Reverse Translation
   Selects synonymous codons to maximize CAI
   against the N. benthamiana codon usage table
        │
        ▼
2. Rule Scan
   Detects PolyA signals, homopolymers,
   repeat sequences, forbidden restriction sites
        │
        ▼
3. Domestication
   Removes Golden Gate / MoClo-incompatible
   BsaI / BsmBI recognition sites via silent edits
        │
        ▼
4. Output
   Optimized CDS — FASTA or GenBank
   with full metrics and scan report

Optimization Profiles

Profile Description
balanced CAI + GC balance (default)
high_cai Maximum codon adaptation
gc_target Target GC 42.5% for N. benthamiana
viral_delivery Adjusted for viral vector delivery

Performance

Benchmarked on N. benthamiana codon usage table (v2 engine, 3,876 sequences):

Metric Value Target
CAI (mean) 0.80 ≥ 0.75
GC% (mean) 42.54% 40–55%
GC% (range) 40.36–53.81% 40–55%
AA identity 100% 100%
Validator pass rate 100% 100%

Supported Hosts

Host Status
Nicotiana benthamiana ✅ Supported
Wolffia globosa 🔶 Codon table available, coming soon
Other plant hosts 📋 Planned

Installation

Requirements: Python 3.10+

pip install factorforge-cds

For ML features (ESM2 + BART decoder, experimental):

pip install "factorforge-cds[ml]"

For development:

git clone https://github.com/eijex/factorforge.git
cd factorforge
pip install -e ".[dev]"

CLI Reference

# Basic optimization (DP feasibility engine, default)
factorforge optimize input.fasta -o output.fasta

# Rule-based engine with profile
factorforge optimize input.fasta -e v2 -p balanced -o output.fasta

# With MoClo construct template, GenBank output
factorforge optimize input.fasta -e v2 -p balanced \
  --template standard_expression -o output.gb --format genbank

# Custom GC target range
factorforge optimize input.fasta --gc-min 40 --gc-max 50 -o output.fasta

# List available engines
factorforge list-engines

Key options:

Option Default Description
--engine, -e dp Engine: dp (feasibility) or v2 (rule-based)
--profile, -p balanced Optimization profile
--objective feasibility_best DP objective
--gc-min / --gc-max 40 / 55 GC% target range
--format fasta or genbank Output format
--scan-mode full Rule scan: full or fast

Output

Each optimized sequence includes:

  • Optimized CDS — synonymous codon replacements only, AA identity 100%
  • CAI score — codon adaptation index for N. benthamiana
  • GC content — global and first-region
  • Scan report — PolyA signals detected/fixed, homopolymers, restriction sites
  • Domestication report — BsaI/BsmBI sites removed, edit count
  • Construct ID — reproducible hash for tracking

⚠️ Validation Status

FactorForge predictions are in-silico only and have not been experimentally validated in wet-lab conditions.

We are actively seeking researchers to test these predictions. If you use FactorForge in your experiments, we'd love to hear from you:

  • Did the optimized sequence express well?
  • How did CAI / GC% correlate with actual expression levels?
  • Any unexpected results?

Share your resultsGitHub Issues or email: eijex.lab@gmail.com

Validated results will be credited in VALIDATION.md and future releases.


🛠️ Developed With

This project was built using the following tools and platforms:

Tool Role
Claude / Claude Code (Anthropic) Architecture design, domain analysis, code review
Codex (OpenAI) Code generation and implementation
Google Colab ML training (Run 1, Run 2)
Kaggle ML training (alpha_run1, alpha_run2)
ESM2 (Meta) Protein language model (encoder)
PyTorch ML framework
Conda / Miniconda Environment management
Vercel Web deployment
GitHub Version control and open-source hosting
HuggingFace Model ecosystem
BioPython Biological sequence processing

Citing

If you use FactorForge in your research, please cite:

FactorForge v3.0.0 (2026). Open-source constraint-based CDS design engine.
Eijex. https://github.com/eijex/factorforge

A citable publication is in preparation. Until then, please cite the GitHub repository.


License & Disclaimer

FactorForge source code is licensed under the Apache License 2.0.

Disclaimer: FactorForge is provided for research purposes only. Predictions are computational and have not been experimentally validated. The authors make no warranties regarding expression outcomes in wet-lab settings. Use at your own discretion.


Get in Touch

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

factorforge_cds-3.0.0.tar.gz (81.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

factorforge_cds-3.0.0-py3-none-any.whl (93.4 kB view details)

Uploaded Python 3

File details

Details for the file factorforge_cds-3.0.0.tar.gz.

File metadata

  • Download URL: factorforge_cds-3.0.0.tar.gz
  • Upload date:
  • Size: 81.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for factorforge_cds-3.0.0.tar.gz
Algorithm Hash digest
SHA256 4a7d6fc73c9f3e8fe3606fe4ab2c4cfaaf787a36cb882a428071cfe43903d67e
MD5 9a6c857d2d6bea24d53a4ccca9cbe69d
BLAKE2b-256 8b8c65e109c290a19d27f7677f1fd12737327aa11f4bff1c4356238f4f2db322

See more details on using hashes here.

File details

Details for the file factorforge_cds-3.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for factorforge_cds-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b2e661133cb049ac641e19c14399d8beb91686a4067523b3c891697e9a765f7e
MD5 e36d55e44fff5bff9d0b76b62e08ba30
BLAKE2b-256 3e4f12e5513b8d93d6496b719bde04e77150c7d9642da0b483becb5bebd33d87

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page