End-to-end partial-weight transfer pipeline.
Project description
ModelPulse ๐
End-to-end partial-weight transfer pipeline.
Device A serves model shards โ Device B reconstructs the GGUF in RAM and runs inference with no persistent GGUF reconstruction on disk.
Device A Device B
โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
modelpulse server ./shards modelpulse bridge run http://100.101.102.103:8000
โ โ
โโโ GET /manifest โโโโโโโโโโโโโโโโโโโโโโบ โ 1. fetch manifest
โโโ GET /shards/* โโโโโโโโโโโโโโโโโโโโโโบ โ 2. pull all shards (streaming)
โ โ 3. assemble GGUF in RAM โ /dev/shm
โ โ 4. llama.cpp loads from /dev/shm
โ โ 5. run inference, stream tokens
โโโ POST /metrics โโโโโโโโโโโโโโโโโโโโโ โ 6. send collected metrics
๐ธ Screenshots
Server (Device A)
Bridge (Device B)
Inference in Progress
Metrics Sent Back
๐ฆ Install
Install ModelPulse as a Python package directly from GitHub:
pip install git+https://github.com/MdSufiyan005/Model-Pulse.git
๐ Workflow
1 โ Prepare shards on Device A
Use gguf_to_shards.py from the companion tools to convert your GGUF model:
python tools/gguf_to_shards.py convert model.gguf ./shards/
2 โ Start the server on server
modelpulse server run ./shards --host 0.0.0.0 --port 8000
3.0 - getting talescale ip
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up # signup on the page
tailscale ip # get the ip address - eg: 100.101.102.103
3.1 โ Run inference on edge device
modelpulse bridge run http://100.101.102.103:8000
๐ Commands
Device A (Server)
modelpulse server run <shards_dir> [options]
| Option | Default | Description |
|---|---|---|
--port |
8000 |
Server port |
--host |
0.0.0.0 |
Bind address |
--metrics-log |
metrics.jsonl |
Metrics log file |
Device B (Client)
modelpulse bridge run <host> [options]
modelpulse bridge status <host> [--all]
| Command | Description |
|---|---|
run |
Full pipeline: pull โ infer โ report |
status |
Display latest metrics from Device A |
Bridge run options
| Flag | Default | Description |
|---|---|---|
--prompt / -p |
(interactive) | Prompt string |
--max-tokens |
256 |
Tokens to generate |
--temp / -t |
0.7 |
Sampling temperature |
--ctx |
2048 |
Context window |
--no-report |
false |
Skip sending metrics |
๐พ Zero-Disk Strategy
shard_data โโโ assemble_gguf_bytes() โโโบ gguf_bytes (RAM)
โ
write_bytes()
โ
/dev/shm/sb_<pid>.gguf โ tmpfs, never touches physical disk
โ
del gguf_bytes โ Python bytes freed
โ
Llama(model_path=...) โ mmap from tmpfs
โ
cleanup() โ unlink()
This keeps the model file temporarly on the ram, while still satisfying llama.cppโs file-path requirement.
The system prioritizes /dev/shm and /run/shm by checking for existence and write access, falling back to $TMPDIR or /tmp if no RAM-backed filesystem is available.
๐ Project Layout
modelpulse/
โโโ pyproject.toml # Packaging config
โโโ README.md # This doc
โโโ Images/ # Screenshots (server.png, bridge.png, etc.)
โโโ modelpulse/ # Core package
โ โโโ __init__.py
โ โโโ main.py # Unified CLI: modelpulse bridge, modelpulse server
โ โโโ shared/ # Shared models
โ โ โโโ __init__.py
โ โ โโโ models.py # ShardManifest, InferenceMetrics
โ โโโ server/ # Server side
โ โ โโโ __init__.py
โ โ โโโ server.py # FastAPI server
โ โโโ client/ # Client side
โ โโโ __init__.py
โ โโโ cli.py # Bridge CLI
โ โโโ bridge.py # RAM GGUF assembly + llama.cpp
โ โโโ shard_client.py # Async HTTP client
โโโ tools/ # Utilities
โ โโโ gguf_parser.py
โ โโโ gguf_to_shards.py # GGUF โ shard converter
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file modelpulse-0.1.1.tar.gz.
File metadata
- Download URL: modelpulse-0.1.1.tar.gz
- Upload date:
- Size: 20.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e4713d25a13470753a76c2a5110443d0c8e801f06af6389bfa22b519a5d00d51
|
|
| MD5 |
b0268ce51f89da9f1068405f0e9cb756
|
|
| BLAKE2b-256 |
c2239042dd07a61e025cddd80665be01db6af840c1713508f01a908d763857c1
|
File details
Details for the file modelpulse-0.1.1-py3-none-any.whl.
File metadata
- Download URL: modelpulse-0.1.1-py3-none-any.whl
- Upload date:
- Size: 17.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
efb8d63f08ff319b1a68ee6c88ce64bc742dcea8edd3caa26f4441f0cae59827
|
|
| MD5 |
4b977e9c7499f7d8bca5380d6d16d291
|
|
| BLAKE2b-256 |
54d2056fb5c06b41a4432873c4603edf5af2c71ff42737c617f0f76835d01bf4
|