Skip to main content

End-to-end partial-weight transfer pipeline.

Project description

ModelPulse ๐Ÿš€

End-to-end partial-weight transfer pipeline.

Device A serves model shards โ†’ Device B reconstructs the GGUF in RAM and runs inference with no persistent GGUF reconstruction on disk.

Device A                                      Device B
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                          โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
modelpulse server ./shards                   modelpulse bridge run http://100.101.102.103:8000
  โ”‚                                         โ”‚
  โ”œโ”€โ”€ GET /manifest  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ โ”‚  1. fetch manifest
  โ”œโ”€โ”€ GET /shards/*  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ โ”‚  2. pull all shards (streaming)
  โ”‚                                         โ”‚  3. assemble GGUF in RAM โ†’ /dev/shm
  โ”‚                                         โ”‚  4. llama.cpp loads from /dev/shm
  โ”‚                                         โ”‚  5. run inference, stream tokens
  โ””โ”€โ”€ POST /metrics  โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚  6. send collected metrics

๐Ÿ“ธ Screenshots

Server (Device A) Modelpulse-Server

Bridge (Device B) Modelpulse-Bridge

Inference in Progress Running Inference

Metrics Sent Back Sending Benchmark data


๐Ÿ“ฆ Install

Install ModelPulse as a Python package directly from GitHub:

pip install git+https://github.com/MdSufiyan005/Model-Pulse.git

๐Ÿ”„ Workflow

1 โ€” Prepare shards on Device A

Use gguf_to_shards.py from the companion tools to convert your GGUF model:

python tools/gguf_to_shards.py convert model.gguf ./shards/

2 โ€” Start the server on server

modelpulse server run ./shards --host 0.0.0.0 --port 8000

3.0 - getting talescale ip

curl -fsSL https://tailscale.com/install.sh | sh

sudo tailscale up # signup on the page

tailscale ip # get the ip address - eg: 100.101.102.103

3.1 โ€” Run inference on edge device

modelpulse bridge run http://100.101.102.103:8000

๐Ÿ“‹ Commands

Device A (Server)

modelpulse server run <shards_dir> [options]
Option Default Description
--port 8000 Server port
--host 0.0.0.0 Bind address
--metrics-log metrics.jsonl Metrics log file

Device B (Client)

modelpulse bridge run <host> [options]
modelpulse bridge status <host> [--all]
Command Description
run Full pipeline: pull โ†’ infer โ†’ report
status Display latest metrics from Device A

Bridge run options

Flag Default Description
--prompt / -p (interactive) Prompt string
--max-tokens 256 Tokens to generate
--temp / -t 0.7 Sampling temperature
--ctx 2048 Context window
--no-report false Skip sending metrics

๐Ÿ’พ Zero-Disk Strategy

shard_data  โ”€โ”€โ”€ assemble_gguf_bytes() โ”€โ”€โ–บ gguf_bytes (RAM)
                                              โ”‚
                                       write_bytes()
                                              โ”‚
                                        /dev/shm/sb_<pid>.gguf   โ† tmpfs, never touches physical disk
                                              โ”‚
                                       del gguf_bytes            โ† Python bytes freed
                                              โ”‚
                                    Llama(model_path=...)        โ† mmap from tmpfs
                                              โ”‚
                                       cleanup() โ†’ unlink()

This keeps the model file temporarly on the ram, while still satisfying llama.cppโ€™s file-path requirement.

The system prioritizes /dev/shm and /run/shm by checking for existence and write access, falling back to $TMPDIR or /tmp if no RAM-backed filesystem is available.


๐Ÿ“ Project Layout

modelpulse/
โ”œโ”€โ”€ pyproject.toml          # Packaging config
โ”œโ”€โ”€ README.md               # This doc
โ”œโ”€โ”€ Images/                 # Screenshots (server.png, bridge.png, etc.)
โ”œโ”€โ”€ modelpulse/             # Core package
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ main.py            # Unified CLI: modelpulse bridge, modelpulse server
โ”‚   โ”œโ”€โ”€ shared/            # Shared models
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ””โ”€โ”€ models.py      # ShardManifest, InferenceMetrics
โ”‚   โ”œโ”€โ”€ server/          # Server side
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ””โ”€โ”€ server.py      # FastAPI server
โ”‚   โ””โ”€โ”€ client/          # Client side
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ cli.py         # Bridge CLI
โ”‚       โ”œโ”€โ”€ bridge.py      # RAM GGUF assembly + llama.cpp
โ”‚       โ””โ”€โ”€ shard_client.py # Async HTTP client
โ”œโ”€โ”€ tools/                 # Utilities
โ”‚   โ”œโ”€โ”€ gguf_parser.py
โ”‚   โ””โ”€โ”€ gguf_to_shards.py  # GGUF โ†’ shard converter

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modelpulse-0.1.1.tar.gz (20.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

modelpulse-0.1.1-py3-none-any.whl (17.3 kB view details)

Uploaded Python 3

File details

Details for the file modelpulse-0.1.1.tar.gz.

File metadata

  • Download URL: modelpulse-0.1.1.tar.gz
  • Upload date:
  • Size: 20.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for modelpulse-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e4713d25a13470753a76c2a5110443d0c8e801f06af6389bfa22b519a5d00d51
MD5 b0268ce51f89da9f1068405f0e9cb756
BLAKE2b-256 c2239042dd07a61e025cddd80665be01db6af840c1713508f01a908d763857c1

See more details on using hashes here.

File details

Details for the file modelpulse-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: modelpulse-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 17.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for modelpulse-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 efb8d63f08ff319b1a68ee6c88ce64bc742dcea8edd3caa26f4441f0cae59827
MD5 4b977e9c7499f7d8bca5380d6d16d291
BLAKE2b-256 54d2056fb5c06b41a4432873c4603edf5af2c71ff42737c617f0f76835d01bf4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page