Optimize and run PyTorch models: an open-core compiler (fusion, buffer planning, persistent compile cache) plus a license-gated serving platform that runs your models behind an inference server.
Project description
g2n
Optimize and run PyTorch models. g2n is the open-core compiler at the
center of the g2n platform: pointwise fusion, buffer-reuse planning, and a
persistent cross-run compile cache so repeat builds skip recompilation. A
license unlocks the enhanced planner, the persistent cache, and a full
serving layer that runs your models behind an HTTP inference server.
pip install g2n
import torch
import g2n
model = MyModule().eval()
compiled = g2n.compile(model) # optimize
# or register as a torch.compile backend:
compiled = torch.compile(model, backend="g2n")
Two halves, one license
| Community (free) | Pro | Enterprise | |
|---|---|---|---|
| Optimize — fusion, JIT pointwise codegen, CPU fallback | ✓ | ✓ | ✓ |
| Enhanced buffer planner + persistent compile cache | ✓ | ✓ | |
Run — model registry + inference server (g2n.serve()) |
✓ | ✓ | |
| Dynamic batching, multi-accelerator routing, model-zoo | ✓ |
Activate a license to light up the paid tiers (the same code path — gated features turn on, otherwise it falls back to the open-core path):
g2n activate G2N-XXXX-XXXX-XXXX
Run your models (Pro+)
The enterprise client (pip install g2n-enterprise) adds the serving platform:
import g2n_enterprise as g2n
g2n.register_model("resnet", "torchscript:/models/resnet50.pt",
precision="auto", cuda_graph=True, max_batch=16)
g2n.serve(port=8900) # POST /v1/models/resnet/predict
res = g2n.benchmark("resnet", sample, rounds=200) # eager vs optimized, measured on your box
Serving applies real inference techniques — inference_mode, fp16/bf16/int8,
CUDA-graph capture/replay (which removes the launch overhead that makes
"compiled tie eager" on small GPUs), and a VRAM residency manager so a small card
serves more models than fit. Speedups are hardware-dependent: benchmark on your
own GPU rather than trusting a quoted number.
Custom kernels (Pro / Enterprise)
With a licensed tier, the g2n backend runs a real custom compile pass: it fuses
LayerNorm (and a trailing GELU) into a Triton kernel via a torch.library
custom op, then hands the rest of the graph to TorchInductor. See
ARCHITECTURE.md. Correctness is covered by tests/test_layernorm.py.
The fusion is inference-only. The fused kernel is forward-only, so the pass
skips any differentiable (training) graph and lets stock lowering handle it —
training compiles correctly, just unfused. Inference under
torch.no_grad() / torch.inference_mode() (which the serving runtime always
uses) gets the fused kernel. Benchmark on your own GPU before quoting a speedup.
Docs: https://g2n.dev/docs · Pricing: https://g2n.dev/pricing
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file g2n-0.5.8.tar.gz.
File metadata
- Download URL: g2n-0.5.8.tar.gz
- Upload date:
- Size: 20.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3bab7de93e19013d2d076c9c9ab0d1096cc8b319f488a00471af762f5ca3fbb
|
|
| MD5 |
b789f66a9a20924a170871590da74f70
|
|
| BLAKE2b-256 |
52eccafb59968dec91fa3e4db8e42608ac0c53d8d442d6008728b244b8957764
|
File details
Details for the file g2n-0.5.8-py3-none-any.whl.
File metadata
- Download URL: g2n-0.5.8-py3-none-any.whl
- Upload date:
- Size: 18.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f060f22e68e0dcacc8b7cce7bd30c017e74ada0d60e7d1807ad0834933f8f5cf
|
|
| MD5 |
f1a54c5aa00035ea86f11bfbc44968f9
|
|
| BLAKE2b-256 |
23c14c16f3ff689c67c063c358fbaf1fb7ebc1a8706fef5b30364096882cd30c
|