Skip to main content

A platform to optimize AND run PyTorch models: license-gated compiler (enhanced planner, persistent cache, multi-accelerator routing) plus a model registry and inference server, on top of open-core g2n.

Project description

g2n-Enterprise

A platform to optimize and run PyTorch models. It has two halves that share one code path and one license:

  • Optimizeg2n.compile(model) (or torch.compile(model, backend="g2n")): hybrid fusion, a custom Triton LayerNorm(+GELU) kernel, an in-place-aware buffer planner, a persistent cross-run compile cache, and multi-accelerator routing.
  • Run — a built-in model registry + inference server (g2n.serve()): register a model once, and the node loads it, optimizes it on the way in, and serves predictions over HTTP, with optional dynamic batching.

Around that sits the full machinery to sell it as a service: a signed license system, a zero-dependency license server, a license-management dashboard, and an ancient-Greek-styled WordPress front end (theme + plugin) that talks to the license API.

Deployment model. The customer installs g2n inside their environment. Everything (compile + serve) executes there; the license server only mints and validates entitlements.

import g2n_enterprise as g2n
g2n.activate("G2N-8H4K-L92X-QF7M")          # online once, then cached + offline

# OPTIMIZE
model = g2n.compile(model)                   # or torch.compile(model, backend="g2n")

# RUN
g2n.register_model("resnet", "torchscript:/models/resnet50.pt",
                   max_batch=16, max_latency_ms=8)
g2n.serve(port=8900)                          # POST /v1/models/resnet/predict

What's sold (tiers)

Capability Community Pro Enterprise
Hybrid fusion + JIT pointwise codegen
Enhanced buffer planner (in-place aware)
Persistent cross-run compile cache
Model registry + inference server (run models)
Dynamic request batching (autotuned)
Multi-accelerator auto-routing (GPU / NPU / CPU)
Validated model-zoo configs + priority support

Code never changes between tiers — gated features light up when the license grants them and silently fall back to the open-core path otherwise.


How the three pieces connect

   ┌─────────────────────────┐        ┌──────────────────────────────┐
   │  WordPress storefront    │        │   License server             │
   │  (frontend repo)         │        │   (backend/license_server)   │
   │  • [g2n_pricing] ────────┼──GET /v1/catalog──▶ tiers + price ────┤  one source
   │  • [g2n_buy] ────────────┼──POST /v1/checkout─▶ Paddle ──webhook─▶│  of truth for
   │  • [g2n_dashboard] ──────┼──POST /v1/portal ──▶ Paddle portal     │  entitlements
   │  • [g2n_status] ─────────┼──GET /v1/health,/version               │
   │  • [g2n_node_status] ──┐ │        └──────────┬───────────────────┘
   └────────────────────────┼─┘                   │ mints signed token
                            │                      ▼
                            │        ┌──────────────────────────────┐
   browser reads the user's│        │  pip install g2n              │
   OWN node directly ──────┘        │  + g2n-enterprise (backend)   │
                            ┌────────│  g2n.activate(KEY) ─verifies──┘ offline,
                            │        │  g2n.compile(model)   OPTIMIZE   Ed25519
                            ▼        │  g2n.serve()          RUN
   ┌─────────────────────────┐      └──────────────┬───────────────┘
   │ customer inference node  │◀── activate/validate │
   │ /v1/healthz /readyz      │── runs in the customer's environment
   │ /v1/models/<id>/predict  │
   └─────────────────────────┘
  • WordPress ↔ license server — the plugin proxies the server's /v1/catalog (price/blurb/seats), /v1/checkout, /v1/portal, /v1/health and /v1/version server-side; the storefront price always reflects the server (single source of truth). The admin token never reaches the browser.
  • license server ↔ clientg2n.activate(KEY) exchanges a short key for an Ed25519-signed token (once, online), then verifies it offline. The same feature flags (g2n_enterprise/licensing/features.py) gate the compiler and the serving platform, and feed the WordPress pricing — one definition, everywhere.
  • WordPress ↔ customer node[g2n_node_status] reads the customer's own inference node directly from the browser (the node's CORS allows it); the vendor never touches customer inference.

Monorepo layout

g2n-enterprise/
  g2n_enterprise/            # the closed-source client package (pip-installable)
    licensing/              # Ed25519 keys, signed tokens, activate(), gating
    accel/registry.py       # AcceleratorBackend ABC + auto_select (CUDA/CPU/NPU)
    cache/persistent.py     # cross-run Triton + artifact cache (Windows warmup fix)
    planner_pro.py          # in-place-aware planner (gated)
    model_zoo.py            # validated compile configs + parity harness (gated)
    serve/                  # -- the "run models" half --
      registry.py           #   ModelRegistry: register/list/version models (JSON)
      runtime.py            #   ModelRuntime + DynamicBatcher + latency stats
      server.py             #   stdlib inference HTTP node (/v1/models/.../predict)
      _demo.py              #   torch-free demo models (python: sources)
    api.py                  # compile() + serve()/register_model()/load_model()
    cli.py                  # doctor/activate/status + models/register/predict/serve
  license_server/           # ZERO-dependency server (stdlib http.server + sqlite3)
    app.py                  #   /v1/activate /v1/validate /v1/catalog + Paddle + admin
    mint.py                 #   CLI: keygen / issue / list / trial
    paddle_gateway.py       #   Paddle Billing (checkout, webhook verify, portal)
    dashboard/index.html    #   license-management dashboard
  packaging/                # builds the open-core `g2n` wheel (Apache-2.0)
  examples/                 # quickstart, torch.compile backend, serve_quickstart
  tests/                    # licensing / registry / cache / serving (no GPU needed)

Optimize: g2n.compile

g2n.compile(model) routes to the best available accelerator and runs the open-core g2n pipeline (custom-kernel FX pass + Inductor) under license-gated config. As a torch.compile backend:

import g2n_enterprise            # registers backend="g2n" on import
compiled = torch.compile(model, backend="g2n")

Pro lights up memory-fusion + the persistent cache; Enterprise adds max-autotune and multi-accelerator routing. Without the entitlement (or without torch/Triton/CUDA) every path degrades to stock — never worse than eager.

Run: g2n.serve (Pro+)

The serving half turns a node into a model server.

import g2n_enterprise as g2n

# register once (persisted under ~/.g2n/models)
g2n.register_model("bert", "torchscript:/models/bert.pt",
                   max_batch=32, max_latency_ms=10)

# bring one model up locally
rt = g2n.load_model("bert")
rt.predict(batch)                 # g2n-optimized on load

# or serve every registered model over HTTP
g2n.serve(port=8900, token="node-admin-token")

Source URIs: torchscript:/path.pt, state_dict:/w.pt@pkg.mod:build_fn, callable:pkg.mod:factory, and python:pkg.mod:fn (torch-free — used by the demo models so the node runs anywhere).

Faster, lower-VRAM inference

Each served model gets real inference optimizations (engage on CUDA, no-op on CPU): inference_mode always on (free latency + memory); precision="auto" (fp16/bf16 autocast — halves activation VRAM, tensor cores) or "int8" (dynamic quantization — halves weight memory again, CPU path, opt-in); cuda_graph=True (capture + replay the forward, which removes the kernel-launch overhead that makes "compiled tie eager" on small GPUs); channels_last for conv nets; a ResidencyManager that keeps K models hot on the GPU and pages the rest from CPU (G2N_SERVE_RESIDENT_MODELS=2); and admission control (G2N_SERVE_MAX_CONCURRENCY, G2N_SERVE_VRAM_FLOOR_MB) so a saturated 6 GB node returns 503 + Retry-After instead of OOM-crashing.

g2n.register_model("resnet", "torchscript:/m/resnet50.pt",
                   precision="auto", cuda_graph=True, channels_last=True, max_batch=16)
res = g2n.benchmark("resnet", sample, rounds=200)   # eager vs optimized, measured on YOUR box

g2n.benchmark / g2n-enterprise bench report median latency (p50/p95/p99), throughput and peak VRAM for eager vs optimized — measured on your hardware, never hardcoded. See docs/SERVING.md §4b.

Inference HTTP contract (stdlib server):

Method Path Auth Purpose
GET /v1/healthz liveness + uptime + ready-model count
GET /v1/models list models + per-model latency stats
GET /v1/models/<id> one model's info
GET /v1/metrics aggregate counters (JSON)
POST /v1/models/<id>/predict optional {inputs} -> {outputs, latency_ms}
POST /v1/models node token register + load a model entry

CLI mirror: g2n-enterprise register NAME SOURCE, ... models, ... predict NAME JSON, ... serve --port 8900.

Dynamic batching (Enterprise). When the license grants auto_batch, the runtime coalesces concurrent requests into one batched call within max_latency_ms, preserving per-caller order and length. Below Enterprise, models serve one item at a time (still fully functional).


Deploy the license server (runs immediately)

Pure Python stdlib (only cryptography for signing). No web framework, no DB server.

cd license_server
pip install cryptography
cp .env.example .env            # set a strong G2N_ADMIN_TOKEN
python3 mint.py keygen          # generate YOUR signing key (rotate the demo one!)
#   -> paste the printed public key into g2n_enterprise/licensing/_pubkey.py
./run.sh                        # serves http://0.0.0.0:8800  (+ dashboard at /)

Mint and inspect licenses:

python3 mint.py issue --tier enterprise --seats 25 --days 365 --email acme@co.com
python3 mint.py list

License-server API

Method Path Auth Purpose
POST /v1/activate {key, machine_id} -> signed token
POST /v1/trial {machine_id, email?} -> 14-day hardware-bound trial
POST /v1/validate {token} -> server-side verify
POST /v1/checkout {tier, billing, email} -> Paddle checkout URL
POST /v1/paddle/webhook Paddle sig mint/cancel on subscription + transaction events
POST /v1/portal {key} -> Paddle Customer Portal URL (self-service)
POST /v1/license/lookup {key} -> single masked license row (key is the credential)
GET /v1/catalog tiers + pricing (used by WordPress)
GET /v1/health uptime + status (status widget)
GET /v1/version latest client version + info_url (auto-update channel)
GET /v1/admin/licenses admin list
POST /v1/admin/licenses admin mint a key
POST /v1/admin/licenses/<KEY>/revoke admin revoke
GET /v1/admin/outbox admin recent license-delivery emails
GET /v1/admin/subscriptions admin active subscriptions

Admin auth: X-Admin-Token: <token> or Authorization: Bearer <token> (compared in constant time).


License system: how the security actually works

  • Asymmetric (Ed25519). The server holds the private signing key; the client ships only the public key (_pubkey.py). Clients verify, never forge.
  • Short keys, signed tokens. A customer buys G2N-XXXX-XXXX-XXXX. activate() exchanges it (once, online) for a signed token encoding tier/features/expiry/seat-binding, cached under ~/.g2n/ and verified offline with a 14-day grace window.
  • Seat binding + protected trials. Activation registers a hashed machine id; the server enforces the seat cap and allows one hardware-bound trial per machine.

Honest limitation (important): any license check that runs inside code the customer controls is soft protection — a determined customer can patch the client. This is the correct cryptographic backbone plus a professional deterrent and contractual line, not unbreakable DRM. Don't price or promise as if it were. Concretely: G2N_MACHINE_ID can be overridden via env, so a determined user can present as a "new machine" to dodge seat binding and the one-trial-per- machine rule — treat seat/trial enforcement as a deterrent, not a hard wall.

Built-in abuse controls: the public endpoints (/v1/license/lookup, /v1/checkout, /v1/trial, /v1/activate) are per-IP rate-limited in-process (tune with G2N_RL_*); for multi-worker deployments put your reverse proxy's limiter in front too.

Operational security: put the server behind HTTPS, restrict the wide-open CORS (*) in app.py to your origins, change G2N_ADMIN_TOKEN, and replace the demo secrets/signing_key.pem (never commit/ship it). All four are mandatory pre-launch steps in PRODUCTION_CHECKLIST.md.

Versioning

The three shipping artifacts version independently — they are separate products with separate release cadences, so their numbers will not match:

Artifact Current What it is
g2n (PyPI, open-core) 0.5.x the compiler wheel customers pip install
g2n-enterprise (client + server) 1.3.x the closed client + license server
WordPress plugin + theme 1.4.x the storefront

The client and license server share a version (they're released together); the server advertises the latest client version separately via /v1/version.


Payments — the self-serve loop (Paddle Billing)

Paddle is the Merchant of Record: it hosts the payment page and issues invoices, so there is no PCI scope on your server. The financial loop is closed:

buyer -> [g2n_buy] (WordPress) -> POST /v1/checkout -> Paddle Checkout (hosted)
      -> pays -> Paddle fires webhook -> POST /v1/paddle/webhook (signature-verified)
      -> mint_license(tier) -> email key to buyer -> license active 24/7
cancel/expire -> subscription.* / transaction.* webhook -> license canceled/past_due
  • Stdlib only. Paddle's REST API is called with urllib; webhook signatures are verified with hmac (HMAC-SHA256 over "{ts}:{body}", constant-time compare, replay-window check). No paddle SDK needed.
  • Idempotent. Events are de-duplicated by id; a subscription never mints two licenses.
  • Email delivery. Keys are sent via SMTP if configured; otherwise they queue in an email_outbox table so nothing is lost.
  • Self-service portal. [g2n_dashboard] (key in) -> /v1/portal -> the official Paddle Customer Portal (upgrade/downgrade, cancel, update card, invoices). The admin dashboard adds live Active subscriptions and Email outbox panels.

Setup: create recurring Prices in Paddle, set PADDLE_* and SMTP_* in .env, and register the webhook endpoint https://your-server/v1/paddle/webhook (events: subscription.created/activated/updated/canceled, transaction.completed, transaction.payment_failed).


WordPress front end

  1. Copy the plugin + theme into wp-content/, activate both.
  2. Under Settings -> G2N, set the API base (https://your-server/v1) and admin token.
  3. The plugin auto-creates Pricing, Account, and Status pages and a /docs library. Shortcodes: [g2n_pricing], [g2n_dashboard], [g2n_status], [g2n_buy tier="pro"].

The plugin calls the API server-side via the WP HTTP API; the admin token never reaches the browser.


What is verified vs. what is scaffold

Verified runnable (in this build, CPU-only — no GPU was available):

  • Licensing crypto: sign/verify, tamper + foreign-signer rejection, expiry, offline grace, machine binding, feature gating.
  • License server over real HTTP: mint -> activate -> validate -> seat-limit enforcement -> offline verify; Paddle webhook signature verification and the full webhook-driven lifecycle (mint, idempotency, cancel, payment-failed).
  • Serving platform: model registry persistence + name-uniqueness, the dynamic batcher's order/length/coalescing guarantees, the torch-free runtime, latency stats, and the inference HTTP node end-to-end (health -> register -> predict -> metrics). Plus the entitlement gate (community refused, Pro unlocked). See tests/test_serving.py.
  • Enterprise package imports and runs without torch (degrades to Community).

Scaffold / needs your hardware or vendor SDKs:

  • The CUDA/Triton compile path and on-GPU speedups are your measurements.
  • NPUBackend is an integration contract — subclass it for OpenVINO / CoreML / QNN / ONNX Runtime.
  • The in-place planner path is correctness-sensitive; gated AND opt-in. Validate on a real model first.
  • Serving real torchscript:/state_dict: models requires torch in the node environment; the torch-free python: path is what's exercised here.
  • WordPress PHP is written to standard but not executed here (no PHP runtime).

A note on the benchmarks

Two reports exist: a friend's RTX 5070 Ti report (large wins: +16.5% latency, 50.8% VRAM via the planner) and your own RTX 4050 numbers (g2n roughly ties eager; the real win shows in the g2n + torch.compile synergy). Neither was produced or verified in this build environment. For marketing, lead with the conservative, reproducible numbers from your own hardware and clearly attribute the 5070 Ti figures as an independent third-party run.

License

Proprietary. © g2n. (The open-core g2n wheel under packaging/ is Apache-2.0.)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

g2n_enterprise-1.3.4.tar.gz (74.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

g2n_enterprise-1.3.4-py3-none-any.whl (63.3 kB view details)

Uploaded Python 3

File details

Details for the file g2n_enterprise-1.3.4.tar.gz.

File metadata

  • Download URL: g2n_enterprise-1.3.4.tar.gz
  • Upload date:
  • Size: 74.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for g2n_enterprise-1.3.4.tar.gz
Algorithm Hash digest
SHA256 f34e7b31f48ab921475ec10cca8358ceb76bc6d739e1e14336cabde0d68bc233
MD5 780fdc94735cd4a8d0d272922497231d
BLAKE2b-256 19c4244c71fb4895a6fcc3460267a42d83646f1cc0aa593e816be6c9544353c7

See more details on using hashes here.

File details

Details for the file g2n_enterprise-1.3.4-py3-none-any.whl.

File metadata

  • Download URL: g2n_enterprise-1.3.4-py3-none-any.whl
  • Upload date:
  • Size: 63.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for g2n_enterprise-1.3.4-py3-none-any.whl
Algorithm Hash digest
SHA256 fa0d7bd221ab75b68348a62b7bc1a88421c27a067b74b70ba199d0d8ba4c08a7
MD5 8994f1d307b852b8c9b0dee59df237c8
BLAKE2b-256 d4728eb64e61c7264bc82551da1ba574ae3e049a3abf93f7fad5467d38a340a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page