Skip to main content

Optimize and run PyTorch models: an open-core compiler (fusion, buffer planning, persistent compile cache), a quantum circuit simulator (g2n.quantum), plus a license-gated serving platform that runs your models behind an inference server.

Project description

g2n

Optimize and run PyTorch models. g2n is the open-core compiler at the center of the g2n platform: pointwise fusion, buffer-reuse planning, and a persistent cross-run compile cache so repeat builds skip recompilation. A license unlocks the enhanced planner, the persistent cache, and a full serving layer that runs your models behind an HTTP inference server.

pip install g2n
import torch
import g2n

model = MyModule().eval()
compiled = g2n.compile(model)                 # optimize
# or register as a torch.compile backend:
compiled = torch.compile(model, backend="g2n")

Two halves, one license

Community (free) Pro Enterprise
Optimize — fusion, JIT pointwise codegen, CPU fallback
Enhanced buffer planner + persistent compile cache
Run — model registry + inference server (g2n.serve())
Simulate — quantum circuits up to 24 qubits (g2n.quantum)
Quantum: unlimited qubits + circuit-fusion optimizer
Quantum: batched parameter sweeps (VQE/QML loops)
Dynamic batching, multi-accelerator routing, model-zoo

Activate a license to light up the paid tiers (the same code path — gated features turn on, otherwise it falls back to the open-core path):

g2n activate G2N-XXXX-XXXX-XXXX

Run your models (Pro+)

The enterprise client (pip install g2n-enterprise) adds the serving platform:

import g2n_enterprise as g2n
g2n.register_model("resnet", "torchscript:/models/resnet50.pt",
                   precision="auto", cuda_graph=True, max_batch=16)
g2n.serve(port=8900)        # POST /v1/models/resnet/predict
res = g2n.benchmark("resnet", sample, rounds=200)   # eager vs optimized, measured on your box

Serving applies real inference techniques — inference_mode, fp16/bf16/int8, CUDA-graph capture/replay (which removes the launch overhead that makes "compiled tie eager" on small GPUs), and a VRAM residency manager so a small card serves more models than fit. Speedups are hardware-dependent: benchmark on your own GPU rather than trusting a quoted number.

Quantum circuit simulator (new in 1.0)

g2n.quantum is a statevector simulator — classical hardware, torch tensor contractions, CPU or CUDA. It is not a quantum computer and doesn't pretend to be; it's for developing, testing, and teaching quantum algorithms.

import g2n.quantum as qf

c = qf.Circuit(2).h(0).cnot(0, 1)   # Bell state
c.measure_all(shots=1000)           # {'00': ~500, '11': ~500}
c.expectation("ZZ")                 # tensor(1.)

Free up to 24 qubits. Pro removes the qubit cap (memory-bound) and adds optimize(), a circuit-fusion pass (measured 1.2–1.8× on layered circuits — run benchmarks/quantum_bench.py on your own machine). Enterprise adds vectorized batched parameter sweeps for variational loops (measured 15.7× vs a Python loop on a 64-point sweep). Full API: https://g2n.dev/docs

Custom kernels (Pro / Enterprise)

With a licensed tier, the g2n backend runs a real custom compile pass: it fuses LayerNorm (and a trailing GELU) into a Triton kernel via a torch.library custom op, then hands the rest of the graph to TorchInductor. See ARCHITECTURE.md. Correctness is covered by tests/test_layernorm.py.

The fusion is inference-only. The fused kernel is forward-only, so the pass skips any differentiable (training) graph and lets stock lowering handle it — training compiles correctly, just unfused. Inference under torch.no_grad() / torch.inference_mode() (which the serving runtime always uses) gets the fused kernel. Benchmark on your own GPU before quoting a speedup.

Docs: https://g2n.dev/docs · Pricing: https://g2n.dev/pricing

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

g2n-1.0.0.tar.gz (25.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

g2n-1.0.0-py3-none-any.whl (23.5 kB view details)

Uploaded Python 3

File details

Details for the file g2n-1.0.0.tar.gz.

File metadata

  • Download URL: g2n-1.0.0.tar.gz
  • Upload date:
  • Size: 25.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for g2n-1.0.0.tar.gz
Algorithm Hash digest
SHA256 e4c57ff9b724a68dac7e9abf3d8e1acde302a03b0b1f0450646216406e36d177
MD5 b2acdd760454072defe72320673e6562
BLAKE2b-256 01f2db987dd3b1992a9660a45eb329e2da465a0cea5f3ac4f5c7e4d327cfb33a

See more details on using hashes here.

File details

Details for the file g2n-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: g2n-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 23.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for g2n-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aded5f61a716a3a1e7bd2aa168694a409ae097176a2d80196680288e3d88aeec
MD5 ab2884ca46443fb3a466c820a5c113bf
BLAKE2b-256 3e157af54c74d32b4e5a4ad3933e90a98be6120f8e79846fd0267a39ee78f31c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page