Optimize and run PyTorch models: an open-core compiler (fusion, buffer planning, persistent compile cache) plus a license-gated serving platform that runs your models behind an inference server.

These details have not been verified by PyPI

Project links

Project description

g2n

Optimize and run PyTorch models. g2n is the open-core compiler at the center of the g2n platform: pointwise fusion, buffer-reuse planning, and a persistent cross-run compile cache so repeat builds skip recompilation. A license unlocks the enhanced planner, the persistent cache, and a full serving layer that runs your models behind an HTTP inference server.

pip install g2n

import torch
import g2n

model = MyModule().eval()
compiled = g2n.compile(model)                 # optimize
# or register as a torch.compile backend:
compiled = torch.compile(model, backend="g2n")

Two halves, one license

	Community (free)	Pro	Enterprise
Optimize — fusion, JIT pointwise codegen, CPU fallback	✓	✓	✓
Enhanced buffer planner + persistent compile cache		✓	✓
Run — model registry + inference server (`g2n.serve()`)		✓	✓
Dynamic batching, multi-accelerator routing, model-zoo			✓

Activate a license to light up the paid tiers (the same code path — gated features turn on, otherwise it falls back to the open-core path):

g2n activate G2N-XXXX-XXXX-XXXX

Run your models (Pro+)

The enterprise client (pip install g2n-enterprise) adds the serving platform:

import g2n_enterprise as g2n
g2n.register_model("resnet", "torchscript:/models/resnet50.pt",
                   precision="auto", cuda_graph=True, max_batch=16)
g2n.serve(port=8900)        # POST /v1/models/resnet/predict
res = g2n.benchmark("resnet", sample, rounds=200)   # eager vs optimized, measured on your box

Serving applies real inference techniques — inference_mode, fp16/bf16/int8, CUDA-graph capture/replay (which removes the launch overhead that makes "compiled tie eager" on small GPUs), and a VRAM residency manager so a small card serves more models than fit. Speedups are hardware-dependent: benchmark on your own GPU rather than trusting a quoted number.

Custom kernels (Pro / Enterprise)

With a licensed tier, the g2n backend runs a real custom compile pass: it fuses LayerNorm (and a trailing GELU) into a Triton kernel via a torch.library custom op, then hands the rest of the graph to TorchInductor. See ARCHITECTURE.md. Correctness is covered by tests/test_layernorm.py.

The fusion is inference-only. The fused kernel is forward-only, so the pass skips any differentiable (training) graph and lets stock lowering handle it — training compiles correctly, just unfused. Inference under torch.no_grad() / torch.inference_mode() (which the serving runtime always uses) gets the fused kernel. Benchmark on your own GPU before quoting a speedup.

Docs: https://g2n.dev/docs · Pricing: https://g2n.dev/pricing

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.1.0

Jul 4, 2026

1.0.0

Jul 4, 2026

0.5.9

Jun 30, 2026

0.5.8

Jun 30, 2026

0.5.7

Jun 28, 2026

0.5.6

Jun 28, 2026

0.5.5

Jun 28, 2026

This version

0.5.3

Jun 26, 2026

0.5.2

Jun 25, 2026

0.5.1

Jun 23, 2026

0.5.0

Jun 22, 2026

0.4.3

Jun 22, 2026

0.4.2

Jun 22, 2026

0.4.1

Jun 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

g2n-0.5.3.tar.gz (19.4 kB view details)

Uploaded Jun 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

g2n-0.5.3-py3-none-any.whl (17.9 kB view details)

Uploaded Jun 26, 2026 Python 3

File details

Details for the file g2n-0.5.3.tar.gz.

File metadata

Download URL: g2n-0.5.3.tar.gz
Upload date: Jun 26, 2026
Size: 19.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for g2n-0.5.3.tar.gz
Algorithm	Hash digest
SHA256	`c7399bc383c895843dd7577de84fb46f512389206d4f63f1139a68db460903ed`
MD5	`055193b9fe856871c6c05a65afae21de`
BLAKE2b-256	`dd22c566e34a886f5425ca4cfe842e9a5d319fb869f1df107869e21e1d43d7a5`

See more details on using hashes here.

File details

Details for the file g2n-0.5.3-py3-none-any.whl.

File metadata

Download URL: g2n-0.5.3-py3-none-any.whl
Upload date: Jun 26, 2026
Size: 17.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for g2n-0.5.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`69475f909a8be5726f5581f51e74534577a1efc50eb5999b7a5746f69c73cb1b`
MD5	`b9a100b212c00bf111d12dae28bd724f`
BLAKE2b-256	`0b04e22fb3df606c59ba88e245ec040e0d55d2ef69755fdb73e1facb78d1a213`

See more details on using hashes here.

g2n 0.5.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

g2n

Two halves, one license

Run your models (Pro+)

Custom kernels (Pro / Enterprise)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes