End-to-end partial-weight transfer pipeline.

These details have not been verified by PyPI

Project links

Project description

ModelPulse 🚀

End-to-end partial-weight transfer pipeline.

Device A serves model shards → Device B reconstructs the GGUF in RAM and runs inference with no persistent GGUF reconstruction on disk.

Device A                                      Device B
────────────────────                          ───────────────────────────────────
modelpulse server ./shards                   modelpulse bridge run http://100.101.102.103:8000
  │                                         │
  ├── GET /manifest  ─────────────────────► │  1. fetch manifest
  ├── GET /shards/*  ─────────────────────► │  2. pull all shards (streaming)
  │                                         │  3. assemble GGUF in RAM → /dev/shm
  │                                         │  4. llama.cpp loads from /dev/shm
  │                                         │  5. run inference, stream tokens
  └── POST /metrics  ◄────────────────────  │  6. send collected metrics

📸 Screenshots

Server (Device A) Modelpulse-Server

Bridge (Device B) Modelpulse-Bridge

Inference in Progress Running Inference

Metrics Sent Back Sending Benchmark data

📦 Install

Install ModelPulse as a Python package directly from GitHub:

pip install git+https://github.com/MdSufiyan005/Model-Pulse.git

🔄 Workflow

1 — Prepare shards on Device A

Use gguf_to_shards.py from the companion tools to convert your GGUF model:

python tools/gguf_to_shards.py convert model.gguf ./shards/

2 — Start the server on server

modelpulse server run ./shards --host 0.0.0.0 --port 8000

3.0 - getting talescale ip

curl -fsSL https://tailscale.com/install.sh | sh

sudo tailscale up # signup on the page

tailscale ip # get the ip address - eg: 100.101.102.103

3.1 — Run inference on edge device

modelpulse bridge run http://100.101.102.103:8000

📋 Commands

Device A (Server)

modelpulse server run <shards_dir> [options]

Option	Default	Description
`--port`	`8000`	Server port
`--host`	`0.0.0.0`	Bind address
`--metrics-log`	`metrics.jsonl`	Metrics log file

Device B (Client)

modelpulse bridge run <host> [options]
modelpulse bridge status <host> [--all]

Command	Description
`run`	Full pipeline: pull → infer → report
`status`	Display latest metrics from Device A

Bridge `run` options

Flag	Default	Description
`--prompt / -p`	(interactive)	Prompt string
`--max-tokens`	`256`	Tokens to generate
`--temp / -t`	`0.7`	Sampling temperature
`--ctx`	`2048`	Context window
`--no-report`	`false`	Skip sending metrics

💾 Zero-Disk Strategy

shard_data  ─── assemble_gguf_bytes() ──► gguf_bytes (RAM)
                                              │
                                       write_bytes()
                                              │
                                        /dev/shm/sb_<pid>.gguf   ← tmpfs, never touches physical disk
                                              │
                                       del gguf_bytes            ← Python bytes freed
                                              │
                                    Llama(model_path=...)        ← mmap from tmpfs
                                              │
                                       cleanup() → unlink()

This keeps the model file temporarly on the ram, while still satisfying llama.cpp’s file-path requirement.

The system prioritizes /dev/shm and /run/shm by checking for existence and write access, falling back to $TMPDIR or /tmp if no RAM-backed filesystem is available.

📁 Project Layout

modelpulse/
├── pyproject.toml          # Packaging config
├── README.md               # This doc
├── Images/                 # Screenshots (server.png, bridge.png, etc.)
├── modelpulse/             # Core package
│   ├── __init__.py
│   ├── main.py            # Unified CLI: modelpulse bridge, modelpulse server
│   ├── shared/            # Shared models
│   │   ├── __init__.py
│   │   └── models.py      # ShardManifest, InferenceMetrics
│   ├── server/          # Server side
│   │   ├── __init__.py
│   │   └── server.py      # FastAPI server
│   └── client/          # Client side
│       ├── __init__.py
│       ├── cli.py         # Bridge CLI
│       ├── bridge.py      # RAM GGUF assembly + llama.cpp
│       └── shard_client.py # Async HTTP client
├── tools/                 # Utilities
│   ├── gguf_parser.py
│   └── gguf_to_shards.py  # GGUF → shard converter

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.6

Apr 27, 2026

0.4.5

Apr 27, 2026

0.4.4

Apr 27, 2026

0.4.3

Apr 27, 2026

0.4.2

Apr 27, 2026

0.4.1

Apr 26, 2026

0.4.0

Apr 26, 2026

0.3.3

Apr 24, 2026

0.3.2

Apr 24, 2026

0.3.1

Apr 24, 2026

0.3.0

Apr 24, 2026

0.2.2

Apr 23, 2026

0.2.1

Apr 23, 2026

0.2.0

Apr 23, 2026

This version

0.1.1

Apr 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modelpulse-0.1.1.tar.gz (20.4 kB view details)

Uploaded Apr 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

modelpulse-0.1.1-py3-none-any.whl (17.3 kB view details)

Uploaded Apr 22, 2026 Python 3

File details

Details for the file modelpulse-0.1.1.tar.gz.

File metadata

Download URL: modelpulse-0.1.1.tar.gz
Upload date: Apr 22, 2026
Size: 20.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for modelpulse-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`e4713d25a13470753a76c2a5110443d0c8e801f06af6389bfa22b519a5d00d51`
MD5	`b0268ce51f89da9f1068405f0e9cb756`
BLAKE2b-256	`c2239042dd07a61e025cddd80665be01db6af840c1713508f01a908d763857c1`

See more details on using hashes here.

File details

Details for the file modelpulse-0.1.1-py3-none-any.whl.

File metadata

Download URL: modelpulse-0.1.1-py3-none-any.whl
Upload date: Apr 22, 2026
Size: 17.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for modelpulse-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`efb8d63f08ff319b1a68ee6c88ce64bc742dcea8edd3caa26f4441f0cae59827`
MD5	`4b977e9c7499f7d8bca5380d6d16d291`
BLAKE2b-256	`54d2056fb5c06b41a4432873c4603edf5af2c71ff42737c617f0f76835d01bf4`

See more details on using hashes here.

modelpulse 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ModelPulse 🚀

📸 Screenshots

📦 Install

🔄 Workflow

1 — Prepare shards on Device A

2 — Start the server on server

3.0 - getting talescale ip

3.1 — Run inference on edge device

📋 Commands

Device A (Server)

Device B (Client)

Bridge `run` options

💾 Zero-Disk Strategy

📁 Project Layout

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

modelpulse 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ModelPulse 🚀

📸 Screenshots

📦 Install

🔄 Workflow

1 — Prepare shards on Device A

2 — Start the server on server

3.0 - getting talescale ip

3.1 — Run inference on edge device

📋 Commands

Device A (Server)

Device B (Client)

Bridge run options

💾 Zero-Disk Strategy

📁 Project Layout

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Bridge `run` options