The missing friendly interface for BitNet inference. Ollama for 1-bit LLMs.
Project description
bithub
The missing friendly interface for BitNet inference. Think of it as Ollama for 1-bit LLMs.
BitNet models are incredibly efficient — a 2B parameter model fits in ~800MB of RAM and runs fast on a plain CPU. But there's no easy way to download, manage, and serve them. bithub fixes that.
What it does
bithub setup # One-time: build the inference engine
bithub pull 2B-4T # Download a BitNet model from HuggingFace
bithub models # See all available models
bithub list # See what's installed
bithub serve 2B-4T # Start an OpenAI-compatible API server
bithub run 2B-4T # Chat in your terminal
bithub rm 2B-4T # Remove a model
bithub status # Check engine and model state
Once the server is running, any app that speaks the OpenAI API can connect — Open WebUI, Cursor, your own scripts:
import openai
client = openai.OpenAI(base_url="http://localhost:8080/v1", api_key="not-needed")
response = client.chat.completions.create(
model="2B-4T",
messages=[{"role": "user", "content": "Hello!"}]
)
Available Models
| Name | Parameters | Size | Description |
|---|---|---|---|
| 2B-4T | 2.4B | ~1.8 GB | Microsoft's official BitNet, trained on 4T tokens |
| 700M | 0.7B | ~500 MB | Community 700M model — great for testing |
| 3B | 3.3B | ~2.5 GB | Community 3.3B model |
| 8B | 8.0B | ~5 GB | Llama3 architecture in 1.58-bit |
| falcon3-1B | 1B | ~700 MB | Falcon3 1B instruction-tuned |
| falcon3-3B | 3B | ~2 GB | Falcon3 3B instruction-tuned |
| falcon3-7B | 7B | ~4.5 GB | Falcon3 7B instruction-tuned |
| falcon3-10B | 10B | ~6.5 GB | Falcon3 10B instruction-tuned |
Why bithub?
| Ollama | bithub | |
|---|---|---|
| Engine | llama.cpp | bitnet.cpp |
| Model weights | 4-bit / 8-bit quantized | Native 1.58-bit (ternary) |
| RAM for 2B model | ~2-4 GB | ~800 MB |
| Speed on CPU | Good | 2-6x faster |
| Energy usage | Normal | 55-82% less |
| Model ecosystem | Thousands of models | Growing (~10 models) |
API Endpoints
When you run bithub serve, you get a full OpenAI-compatible API:
| Method | Endpoint | Description |
|---|---|---|
POST |
/v1/chat/completions |
Chat completion (streaming + non-streaming) |
GET |
/v1/models |
List available models |
GET |
/health |
Server health check |
This means bithub works out of the box with Open WebUI, Cursor, Continue, and any tool that supports custom OpenAI endpoints.
Quick Start
# Install (downloads pre-built binaries — no compiler needed)
curl -fsSL https://raw.githubusercontent.com/sagarjhaa/bithub/main/install.sh | bash
# Pull a model and chat
bithub pull 2B-4T
bithub run 2B-4T
That's it. No cmake, no clang, no compiling.
Installation
Quick Install (recommended)
curl -fsSL https://raw.githubusercontent.com/sagarjhaa/bithub/main/install.sh | bash
Downloads the bithub CLI and pre-built bitnet.cpp binaries for your platform (macOS/Linux, x86_64/arm64). Requires Python 3.9+.
pip
pip install bithub
bithub setup # compiles bitnet.cpp (requires cmake + clang)
Docker
docker run -p 8080:8080 -v ~/.bithub:/root/.bithub ghcr.io/sagarjhaa/bithub pull 2B-4T
docker run -p 8080:8080 -v ~/.bithub:/root/.bithub ghcr.io/sagarjhaa/bithub serve 2B-4T
Open http://localhost:8080 for the built-in web dashboard.
From Source
git clone https://github.com/sagarjhaa/bithub.git
cd bithub
pip install -e ".[dev]"
bithub setup # requires cmake + clang
Features
- Interactive REPL —
bithub runwith markdown rendering, history, and/commands - OpenAI-compatible API —
bithub serveworks with Open WebUI, Cursor, any OpenAI client - Web Dashboard — chat, model management, server stats at
http://localhost:8080 - Multi-model serving —
bithub serve 2B-4T falcon3-3Bwith model routing - Direct HuggingFace pull —
bithub pull hf:org/repofor any GGUF model - Lazy loading —
bithub serve --lazystarts backends on first request
Roadmap
- CLI with model registry (8 BitNet models)
- HuggingFace downloader + bitnet.cpp builder
- OpenAI-compatible API server
- Test suite (140+ tests), CI/CD, structured logging
- Docker, install script, GitHub Releases
- Interactive REPL with slash commands
- Multi-model serving with lazy loading
- Web dashboard (chat, models, server, settings)
- Performance benchmarks (
bithub bench) - Homebrew formula
Contributing
This project is in early development and contributions are very welcome! See CONTRIBUTING.md for guidelines.
License
MIT — see LICENSE for details.
Acknowledgements
- Microsoft BitNet — the inference engine this project wraps
- Ollama — the UX inspiration
- llama.cpp — the foundation BitNet is built on
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bithub-0.2.1-py3-none-any.whl.
File metadata
- Download URL: bithub-0.2.1-py3-none-any.whl
- Upload date:
- Size: 37.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1fce447ae78c6c21070c02acb9bcd0934ecd541791252036a64a5e3b5a9c7560
|
|
| MD5 |
d9064f83856aa151c11a2d9b62252c1b
|
|
| BLAKE2b-256 |
2b28c28573bddaad7ad05bf096c1ecdfbd712c37567b2850acc58349ac8b60f4
|
Provenance
The following attestation bundles were made for bithub-0.2.1-py3-none-any.whl:
Publisher:
release.yml on sagarjhaa/bithub
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bithub-0.2.1-py3-none-any.whl -
Subject digest:
1fce447ae78c6c21070c02acb9bcd0934ecd541791252036a64a5e3b5a9c7560 - Sigstore transparency entry: 1264267752
- Sigstore integration time:
-
Permalink:
sagarjhaa/bithub@4d77f1353b5bf87e767b17fbcc9420c73e33b432 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/sagarjhaa
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@4d77f1353b5bf87e767b17fbcc9420c73e33b432 -
Trigger Event:
push
-
Statement type: