Run any open LLM on CPU. One command.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

demonarch

These details have not been verified by PyPI

Project description

InferBit

v0.4.1 — Run any open LLM on CPU. One command.

pip install inferbit[cli]
inferbit quantize mistralai/Mistral-7B-Instruct-v0.3 -o model.ibf
inferbit chat model.ibf

InferBit converts HuggingFace models to a compact 4-bit PQv2 codebook format (.ibf) and runs them on CPU everywhere, with optional Apple Metal GPU acceleration on Apple Silicon and a drive mode that streams weights from disk for sub-GB peak RAM on 8B models. No GPU required, no Docker, no complex setup.

Install

# Library only
pip install inferbit

# Library + CLI
pip install inferbit[cli]

# Everything (library + CLI + server)
pip install inferbit[all]

Requires Python 3.9+. Prebuilt wheels for macOS (ARM64, x86_64), Linux (x86_64, ARM64), and Windows (x64, ARM64) — six platforms, all CPU-only by default. Apple Metal GPU is a build-from-source option (see Platform support below).

Quickstart

Command line

# Convert any HuggingFace model to PQv2 4-bit
inferbit quantize meta-llama/Llama-3.2-1B -o llama.ibf

# Convert a local safetensors file
inferbit quantize ./model.safetensors -o model.ibf

# Convert from Ollama (if installed)
inferbit quantize ollama://llama3:8b -o llama3.ibf

# Auto-calibrate: try INT2/INT4/INT8 and keep the first that hits the gate
inferbit quantize meta-llama/Llama-3.2-1B -o llama.ibf \
    --auto-calibrate --max-perplexity 12.0 --min-tokens-per-sec 30

# Quality-gated eval against a JSONL token dataset
inferbit eval-gates model.ibf --dataset tokens.jsonl \
    --max-perplexity 12.0 --min-tokens-per-sec 30

# Interactive chat
inferbit chat model.ibf

# Benchmark
inferbit bench model.ibf --tokens 128 --runs 3

# Model info
inferbit info model.ibf

# Serve OpenAI-compatible API (requires: pip install inferbit[server])
inferbit serve model.ibf --port 8000

Python API

from inferbit import InferbitModel

# Load from HuggingFace (downloads, converts, and loads automatically)
model = InferbitModel.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.3",
    bits=4,
)

# Generate text
output = model.generate("Explain gravity in one sentence:")
print(output)
# "Gravity is the force that attracts objects with mass towards each other."

# Stream tokens
for token in model.stream("Write a haiku about mountains:"):
    print(token, end="", flush=True)

# Or load a pre-converted model
model = InferbitModel.load("model.ibf")

Convert separately

from inferbit import convert

# Convert safetensors to IBF
convert("model.safetensors", "model.ibf", bits=4, sensitive_bits=8)

# Convert a HuggingFace directory (with config.json + sharded safetensors)
convert("./model_dir/", "model.ibf", bits=4)

# Convert with progress callback
convert("model.safetensors", "model.ibf", progress=lambda pct, stage: print(f"{pct:.0%} {stage}"))

Token-level API

from inferbit import InferbitModel

model = InferbitModel.load("model.ibf")

# Work with raw token IDs
token_ids = model.generate_tokens([1, 2, 3, 4, 5], max_tokens=20, temperature=0.7)

# Get raw logits
logits = model.forward([1, 2, 3])

# KV cache control
model.kv_clear()
model.kv_truncate(512)
print(model.kv_length)

Model info

model = InferbitModel.load("model.ibf")
print(model.architecture)   # "llama"
print(model.num_layers)      # 32
print(model.hidden_size)     # 4096
print(model.vocab_size)      # 32768
print(model.max_context)     # 32768
print(model.bits)            # 4
print(model.total_memory_mb) # 3971.0

Quality-gated quantization

from inferbit import search_quantization_profile, EvalGates

# Automatically find the most aggressive quantization that meets quality targets
result = search_quantization_profile(
    "model.safetensors",
    output_dir="./models",
    gates=EvalGates(max_perplexity=10.0, min_tokens_per_sec=5.0),
)
print(f"Selected: {result.selected.name} ({result.selected.bits}-bit)")
print(f"Speed: {result.eval_result.tokens_per_sec:.1f} tok/s")

Supported Sources

Source	Example
HuggingFace Hub	`inferbit quantize mistralai/Mistral-7B-Instruct-v0.3`
Local safetensors	`inferbit quantize model.safetensors`
Sharded safetensors directory	`inferbit quantize ./model_dir/`
Local GGUF	`inferbit quantize model.gguf`
Ollama models	`inferbit quantize ollama://llama3:8b`

Supported Models

Any LLaMA-family architecture with public weights:

LLaMA 2, LLaMA 3, LLaMA 3.2
Mistral, Mixtral
TinyLlama
Code Llama
And any model with the same architecture (GQA/MQA/MHA, RMSNorm, SiLU, RoPE)

Benchmarks

Apple M4, full v0.4.1 cross-engine matrix. Perplexity measured on the same tokenized 2048-token wikitext window for both engines (llama.cpp's tokenization fed to both bench_ppl_run and llama-perplexity, so quality is compared byte-for-byte over the identical sequence). Prefill via bench_compare --prompt-tokens 64 and llama-bench -p 64; decode --gen-tokens 128 / -n 128. Peak RAM from getrusage (InferBit) and /usr/bin/time -l (llama.cpp).

TinyLlama 1.1B-Chat

Engine / mode	File	Prefill	Decode	Peak RAM	PPL
InferBit PQv2 — Metal	528 MiB	437 t/s	55.5 t/s	1205 MB	13.06
InferBit PQv2 — CPU	528 MiB	27 t/s	24.9 t/s	627 MB	13.06
InferBit PQv2 — drive	528 MiB	287 t/s	9.4 t/s	297 MB	13.06
llama.cpp Q4_K_M — Metal	638 MiB	1347 t/s	121.3 t/s	704 MB	13.89
llama.cpp Q4_K_M — CPU	638 MiB	130 t/s	74.2 t/s	1293 MB	13.89

Llama-3.2-1B Instruct

Engine / mode	File	Prefill	Decode	Peak RAM	PPL
InferBit PQv2 — Metal	718 MiB	435 t/s	48.1 t/s	1258 MB	11.29
InferBit PQv2 — CPU	718 MiB	28 t/s	22.7 t/s	847 MB	11.37
InferBit PQv2 — drive	718 MiB	257 t/s	9.3 t/s	602 MB	11.29
llama.cpp Q4_K_M — Metal	770 MiB	1359 t/s	104.3 t/s	888 MB	12.33
llama.cpp Q4_K_M — CPU	770 MiB	132 t/s	64.3 t/s	1644 MB	12.33

Llama-3.1-8B Instruct

Engine / mode	File	Prefill	Decode	Peak RAM	PPL
InferBit PQv2 — Metal	3.75 GiB	65 t/s	8.5 t/s	3203 MB	6.34
InferBit PQv2 — CPU	3.75 GiB	4.5 t/s	4.2 t/s	4306 MB	6.36
InferBit PQv2 — drive	3.75 GiB	34.3 t/s	0.70 t/s	1359 MB	6.34
llama.cpp Q4_K_M — Metal	4.58 GiB	216 t/s	20.1 t/s	4784 MB	6.77
llama.cpp Q4_K_M — CPU	4.58 GiB	4.2 t/s	2.4 t/s	6755 MB	6.77

What the numbers say:

Quality — InferBit PQv2 perplexity is 6–8% lower than the same-bit-budget Q4_K_M on all three models (over the identical token stream).
File size — InferBit .ibf is 7–18% smaller than the equivalent Q4_K_M GGUF.
Speed — On M4 Metal, llama.cpp is 2–3× faster on decode and 3–6× on prefill; on pure CPU the engines are closer. Closing the Metal decode gap is active work (the PQv2 random-access codebook reads).
Memory — InferBit drive mode holds the 8B model in 1.36 GB peak RAM at the same PPL as the in-memory path (3.20 GB) — −58% RAM at zero quality cost. Throughput drops at long contexts (re-streams weights every position); useful when RAM is the binding constraint.

Full methodology + tooling notes in docs/34_METRICS_SNAPSHOT.md.

How it works

Convert: reads safetensors/GGUF weights, quantizes the MLP weights with PQv2 (K=256 per-(chunk, subchunk) codebook + uint8 indices, 4-bit-equivalent) and attention/embeddings with INT8, packs everything into a single mmap-friendly .ibf binary.
Load: memory-maps the .ibf file for instant loading.
Run: hand-tuned C kernels with multi-threaded matmul and parallel attention heads. On Apple Silicon, an optional Metal GPU backend (build from source) routes both prefill and decode through the GPU; the same .ibf works in both modes.
Drive mode (IB_RESIDENCY_MODE=drive, macOS/Linux): weights stream from disk through a bounded GPU/CPU scratch ring instead of being resident. Bit-identical perplexity; the 8B model holds in 1.36 GB peak RAM (see Benchmarks).

The .ibf format is 64-byte aligned, no parsing at load time, and tracks the same K=256 codebooks the GPU kernels consume — there is no quality difference between CPU and GPU.

Configuration

Quantization

Flag	Default	Description
`--bits`	4	Weight quantization (2, 4, 8)
`--sensitive-bits`	8	Attention/embedding bits
`--sparsity`	0.0	Structured sparsity (0.0-0.6)

Generation

Flag	Default	Description
`--temperature`	0.7	Sampling temperature
`--top-k`	40	Top-K sampling
`--top-p`	0.9	Nucleus sampling
`--max-tokens`	512	Max tokens to generate
`--threads`	auto	CPU threads

Platform support

Platform	Wheel	CPU SIMD	Metal GPU	Drive mode
macOS Apple Silicon (arm64)	`macosx_11_0_arm64`	NEON	opt-in (build from source)	✓
macOS Intel (x86_64)	`macosx_10_15_x86_64`	portable C	—	✓
Linux x86_64	`manylinux_2_17_x86_64`	portable C	—	✓
Linux ARM64 (aarch64)	`manylinux_2_17_aarch64`	NEON + dotprod	—	✓
Windows x64	`win_amd64`	portable C (MSVC)	—	—
Windows ARM64	`win_arm64`	NEON (MSVC)	—	—

Build with Metal GPU (Apple Silicon, recommended for best M-series throughput):

# clone the engine repo, then:
cmake -B build -DIB_ENABLE_METAL=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build -j
# point InferBit at the freshly-built dylib:
export INFERBIT_LIB_PATH="$PWD/build/libinferbit.dylib"
python -c "import inferbit; print(inferbit.__version__)"

Drive mode is currently macOS/Linux only (uses POSIX madvise/fcntl(F_NOCACHE)); on Windows the runtime keeps weights resident.

Architecture

libinferbit (C shared library)
    |
    +-- Python: pip install inferbit
    +-- Node.js: npm install @inferbit/{core,node,cli}

Single C engine, multiple language bindings. Same .ibf model file, same numerics, any language.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

demonarch

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.4.1

May 15, 2026

0.4.0

May 14, 2026

0.3.0

May 3, 2026

0.2.3

Apr 29, 2026

0.1.1

Apr 12, 2026

0.1.0

Apr 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

inferbit-0.4.1-py3-none-win_arm64.whl (482.3 kB view details)

Uploaded May 15, 2026 Python 3Windows ARM64

inferbit-0.4.1-py3-none-win_amd64.whl (513.6 kB view details)

Uploaded May 15, 2026 Python 3Windows x86-64

inferbit-0.4.1-py3-none-manylinux_2_17_x86_64.whl (561.0 kB view details)

Uploaded May 15, 2026 Python 3manylinux: glibc 2.17+ x86-64

inferbit-0.4.1-py3-none-manylinux_2_17_aarch64.whl (573.5 kB view details)

Uploaded May 15, 2026 Python 3manylinux: glibc 2.17+ ARM64

inferbit-0.4.1-py3-none-macosx_11_0_arm64.whl (220.1 kB view details)

Uploaded May 15, 2026 Python 3macOS 11.0+ ARM64

inferbit-0.4.1-py3-none-macosx_10_15_x86_64.whl (231.7 kB view details)

Uploaded May 15, 2026 Python 3macOS 10.15+ x86-64

File details

Details for the file inferbit-0.4.1-py3-none-win_arm64.whl.

File metadata

Download URL: inferbit-0.4.1-py3-none-win_arm64.whl
Upload date: May 15, 2026
Size: 482.3 kB
Tags: Python 3, Windows ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for inferbit-0.4.1-py3-none-win_arm64.whl
Algorithm	Hash digest
SHA256	`0646653d86beeb09017f99b89ef10faee2ed57d9b4e4c26aeae28d5d549829b6`
MD5	`1f5c79c4d6323d40be0a10d54cded03e`
BLAKE2b-256	`01b464a8646ab4180f30f9fe27a2f27850b4d373fd36c7c344db6d0297025a36`

See more details on using hashes here.

Provenance

The following attestation bundles were made for inferbit-0.4.1-py3-none-win_arm64.whl:

Publisher: release.yml on demonarch/inferbit-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: inferbit-0.4.1-py3-none-win_arm64.whl
- Subject digest: 0646653d86beeb09017f99b89ef10faee2ed57d9b4e4c26aeae28d5d549829b6
- Sigstore transparency entry: 1547126734
- Sigstore integration time: May 15, 2026
Source repository:
- Permalink: demonarch/inferbit-py@25e3340add963ee6fafaaf3702ec276d896e64d3
- Branch / Tag: refs/tags/v0.4.1
- Owner: https://github.com/demonarch
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@25e3340add963ee6fafaaf3702ec276d896e64d3
- Trigger Event: push

File details

Details for the file inferbit-0.4.1-py3-none-win_amd64.whl.

File metadata

Download URL: inferbit-0.4.1-py3-none-win_amd64.whl
Upload date: May 15, 2026
Size: 513.6 kB
Tags: Python 3, Windows x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for inferbit-0.4.1-py3-none-win_amd64.whl
Algorithm	Hash digest
SHA256	`070e5c034295ae3bb98a323daf198ff6aab8a1cbcde084a7ef91b02a3916231e`
MD5	`fb2a98f27d9e55a12a8381303d197fdb`
BLAKE2b-256	`dc029a6ea9560aa49d6323797b44a494499ec9e90e91e427e3fd234551e2684f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for inferbit-0.4.1-py3-none-win_amd64.whl:

Publisher: release.yml on demonarch/inferbit-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: inferbit-0.4.1-py3-none-win_amd64.whl
- Subject digest: 070e5c034295ae3bb98a323daf198ff6aab8a1cbcde084a7ef91b02a3916231e
- Sigstore transparency entry: 1547126757
- Sigstore integration time: May 15, 2026
Source repository:
- Permalink: demonarch/inferbit-py@25e3340add963ee6fafaaf3702ec276d896e64d3
- Branch / Tag: refs/tags/v0.4.1
- Owner: https://github.com/demonarch
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@25e3340add963ee6fafaaf3702ec276d896e64d3
- Trigger Event: push

File details

Details for the file inferbit-0.4.1-py3-none-manylinux_2_17_x86_64.whl.

File metadata

Download URL: inferbit-0.4.1-py3-none-manylinux_2_17_x86_64.whl
Upload date: May 15, 2026
Size: 561.0 kB
Tags: Python 3, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for inferbit-0.4.1-py3-none-manylinux_2_17_x86_64.whl
Algorithm	Hash digest
SHA256	`87b0ac6d9554f533994e48b0cbebfe1baed2e54b42f2c6fe35015b57a6c0053d`
MD5	`b6f960c2a5461fe2c903ef5c5ae5a24b`
BLAKE2b-256	`bed193de093b3fe962dfaa744aaeb50a0b5b47632a858ea94dfdeec49c8a7c09`

See more details on using hashes here.

Provenance

The following attestation bundles were made for inferbit-0.4.1-py3-none-manylinux_2_17_x86_64.whl:

Publisher: release.yml on demonarch/inferbit-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: inferbit-0.4.1-py3-none-manylinux_2_17_x86_64.whl
- Subject digest: 87b0ac6d9554f533994e48b0cbebfe1baed2e54b42f2c6fe35015b57a6c0053d
- Sigstore transparency entry: 1547126719
- Sigstore integration time: May 15, 2026
Source repository:
- Permalink: demonarch/inferbit-py@25e3340add963ee6fafaaf3702ec276d896e64d3
- Branch / Tag: refs/tags/v0.4.1
- Owner: https://github.com/demonarch
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@25e3340add963ee6fafaaf3702ec276d896e64d3
- Trigger Event: push

File details

Details for the file inferbit-0.4.1-py3-none-manylinux_2_17_aarch64.whl.

File metadata

Download URL: inferbit-0.4.1-py3-none-manylinux_2_17_aarch64.whl
Upload date: May 15, 2026
Size: 573.5 kB
Tags: Python 3, manylinux: glibc 2.17+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for inferbit-0.4.1-py3-none-manylinux_2_17_aarch64.whl
Algorithm	Hash digest
SHA256	`6759e3043c5752b3e57f29763b940f83ec745af001f91b3e231c6b2ddab59941`
MD5	`ab6934b394718e0809d52e17ccd35f38`
BLAKE2b-256	`6daede097624043167b050306afcf9341aa83aa37cc1234172a86959474b60bc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for inferbit-0.4.1-py3-none-manylinux_2_17_aarch64.whl:

Publisher: release.yml on demonarch/inferbit-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: inferbit-0.4.1-py3-none-manylinux_2_17_aarch64.whl
- Subject digest: 6759e3043c5752b3e57f29763b940f83ec745af001f91b3e231c6b2ddab59941
- Sigstore transparency entry: 1547126781
- Sigstore integration time: May 15, 2026
Source repository:
- Permalink: demonarch/inferbit-py@25e3340add963ee6fafaaf3702ec276d896e64d3
- Branch / Tag: refs/tags/v0.4.1
- Owner: https://github.com/demonarch
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@25e3340add963ee6fafaaf3702ec276d896e64d3
- Trigger Event: push

File details

Details for the file inferbit-0.4.1-py3-none-macosx_11_0_arm64.whl.

File metadata

Download URL: inferbit-0.4.1-py3-none-macosx_11_0_arm64.whl
Upload date: May 15, 2026
Size: 220.1 kB
Tags: Python 3, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for inferbit-0.4.1-py3-none-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`6e64784fcbb4c5d466dfb18e81e64f815e16679f39495c6cd0ad1a680e17f832`
MD5	`a085482194276e5eceb01e8d32e57572`
BLAKE2b-256	`b2f1b7b467a56893fa785c78b8e5cabfe2b273f3139d0ac8a7e2e1c24410fce2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for inferbit-0.4.1-py3-none-macosx_11_0_arm64.whl:

Publisher: release.yml on demonarch/inferbit-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: inferbit-0.4.1-py3-none-macosx_11_0_arm64.whl
- Subject digest: 6e64784fcbb4c5d466dfb18e81e64f815e16679f39495c6cd0ad1a680e17f832
- Sigstore transparency entry: 1547126770
- Sigstore integration time: May 15, 2026
Source repository:
- Permalink: demonarch/inferbit-py@25e3340add963ee6fafaaf3702ec276d896e64d3
- Branch / Tag: refs/tags/v0.4.1
- Owner: https://github.com/demonarch
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@25e3340add963ee6fafaaf3702ec276d896e64d3
- Trigger Event: push

File details

Details for the file inferbit-0.4.1-py3-none-macosx_10_15_x86_64.whl.

File metadata

Download URL: inferbit-0.4.1-py3-none-macosx_10_15_x86_64.whl
Upload date: May 15, 2026
Size: 231.7 kB
Tags: Python 3, macOS 10.15+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for inferbit-0.4.1-py3-none-macosx_10_15_x86_64.whl
Algorithm	Hash digest
SHA256	`2fb7d998a21113aba82951ae11b975cfc275a1139d2fbc5d15a9089cabeded0e`
MD5	`7140e4a5f2975df6c575ee87e8f9892a`
BLAKE2b-256	`83f5cb2761ba74bebc3c3ffad9fe2f0783b8c7c451416e1f6e7cd93e76350d0a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for inferbit-0.4.1-py3-none-macosx_10_15_x86_64.whl:

Publisher: release.yml on demonarch/inferbit-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: inferbit-0.4.1-py3-none-macosx_10_15_x86_64.whl
- Subject digest: 2fb7d998a21113aba82951ae11b975cfc275a1139d2fbc5d15a9089cabeded0e
- Sigstore transparency entry: 1547126749
- Sigstore integration time: May 15, 2026
Source repository:
- Permalink: demonarch/inferbit-py@25e3340add963ee6fafaaf3702ec276d896e64d3
- Branch / Tag: refs/tags/v0.4.1
- Owner: https://github.com/demonarch
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@25e3340add963ee6fafaaf3702ec276d896e64d3
- Trigger Event: push

inferbit 0.4.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

InferBit

Install

Quickstart

Command line

Python API

Convert separately

Token-level API

Model info

Quality-gated quantization

Supported Sources

Supported Models

Benchmarks

How it works

Configuration

Quantization

Generation

Platform support

Architecture

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance