Skip to main content

Distributed worker for HuggingFace Space scheduling

Project description

HFS v2 - HuggingFace Space Worker 分布式调度系统

基于 Redis 的分布式 Worker 调度系统,用于管理 HuggingFace Space 资源池。

特性

  • 分布式调度 - 多 Worker 并发,自动负载均衡
  • 状态管理 - 原子操作(Lua 脚本),保证一致性
  • 健康检查 - 自动检测崩溃、超时、孤儿资源
  • 账号管理 - 多账号池,自动选择、cooldown、评分
  • Space 轮换 - 自动创建、绑定、轮换、复用
  • Admin CLI - 命令行管理工具

快速开始

1. 安装依赖

pip install redis huggingface-hub click tabulate

2. 初始化系统

cd v2
python admin/init.py

3. 查看状态

python admin/cli.py --redis-url="redis://..." list-nodes
python admin/cli.py --redis-url="redis://..." list-accounts

4. 运行 Worker

python -m hfs --redis-url="redis://..." --space-id=my-space --project-id=demo --node-id=node-1

架构

┌─────────────┐
│   Redis     │  ← 状态存储
└──────┬──────┘
       │
   ┌───┴────┐
   │        │
┌──▼──┐  ┌─▼───┐
│Worker│  │Worker│  ← 独立进程
└──┬──┘  └─┬───┘
   │       │
┌──▼───────▼──┐
│  Scheduler  │  ← 调度器
└─────────────┘

核心模块

  • state.py - 状态机 + 原子操作(Lua 脚本)
  • health.py - 健康检查(崩溃检测、一致性验证)
  • policy.py - 策略配置(场景、命名)
  • worker.py - Worker 主循环(心跳、进程管理)
  • scheduler.py - 调度器(分配、轮换、创建)
  • account.py - 账号管理(选择、cooldown、评分)
  • hf.py - HuggingFace API 封装

测试

# 运行所有测试
pytest tests/ -v

# 运行特定模块
pytest tests/test_state.py -v
pytest tests/test_worker.py -v
pytest tests/test_scheduler.py -v

文档

配置

Redis

export HFS_REDIS_URL="redis://:password@host:port/db"

HuggingFace 账号

admin/init.py 中配置账号列表:

ACCOUNTS = [
    {'username': 'user1', 'token': 'hf_xxx'},
    {'username': 'user2', 'token': 'hf_yyy'}
]

开发

测试驱动开发

  1. 先写测试(tests/test_*.py
  2. 再实现功能(hfs/*.py
  3. 运行测试验证

代码结构

v2/
├── hfs/                # Worker 包
│   ├── state.py        # 状态机
│   ├── health.py       # 健康检查
│   ├── policy.py       # 策略配置
│   ├── worker.py       # Worker 主循环
│   ├── scheduler.py    # 调度器
│   ├── account.py      # 账号管理
│   └── hf.py           # HF API
├── admin/              # Admin 工具
│   ├── cli.py          # 命令行工具
│   └── init.py         # 快速初始化
├── tests/              # 测试
└── docs/               # 文档

许可

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mp2_worker-0.1.1.tar.gz (34.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mp2_worker-0.1.1-py3-none-any.whl (33.5 kB view details)

Uploaded Python 3

File details

Details for the file mp2_worker-0.1.1.tar.gz.

File metadata

  • Download URL: mp2_worker-0.1.1.tar.gz
  • Upload date:
  • Size: 34.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for mp2_worker-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a5359b2a5f732196844a66eae8519b536065f4cb3e2e17566d3ebb03ccba2be0
MD5 14ab377683af62a6805b5a4b6a1b61d3
BLAKE2b-256 5b249a7f16b32dc26bae4103d0d36b042b4eb4a439cf0d7b4baedc29dd5ed298

See more details on using hashes here.

File details

Details for the file mp2_worker-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: mp2_worker-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 33.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for mp2_worker-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8bd7ed742b5dc95df98cb872123969bc37ec3cfb89a929090a093fdb90afa3ae
MD5 2fc34fbdfefa4489c9c1a291cab8b134
BLAKE2b-256 e7dbf41e0fbfe93e3d69e6e63dae29fbb4a4175db3b396a33c878deab27eae5a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page