Skip to main content

Distributed worker for HuggingFace Space scheduling

Project description

HFS v2 - HuggingFace Space Worker 分布式调度系统

基于 Redis 的分布式 Worker 调度系统,用于管理 HuggingFace Space 资源池。

特性

  • 分布式调度 - 多 Worker 并发,自动负载均衡
  • 状态管理 - 原子操作(Lua 脚本),保证一致性
  • 健康检查 - 自动检测崩溃、超时、孤儿资源
  • 账号管理 - 多账号池,自动选择、cooldown、评分
  • Space 轮换 - 自动创建、绑定、轮换、复用
  • Admin CLI - 命令行管理工具

快速开始

1. 安装依赖

pip install redis huggingface-hub click tabulate

2. 初始化系统

cd v2
python admin/init.py

3. 查看状态

python admin/cli.py --redis-url="redis://..." list-nodes
python admin/cli.py --redis-url="redis://..." list-accounts

4. 运行 Worker

python -m hfs --redis-url="redis://..." --space-id=my-space --project-id=demo --node-id=node-1

架构

┌─────────────┐
│   Redis     │  ← 状态存储
└──────┬──────┘
       │
   ┌───┴────┐
   │        │
┌──▼──┐  ┌─▼───┐
│Worker│  │Worker│  ← 独立进程
└──┬──┘  └─┬───┘
   │       │
┌──▼───────▼──┐
│  Scheduler  │  ← 调度器
└─────────────┘

核心模块

  • state.py - 状态机 + 原子操作(Lua 脚本)
  • health.py - 健康检查(崩溃检测、一致性验证)
  • policy.py - 策略配置(场景、命名)
  • worker.py - Worker 主循环(心跳、进程管理)
  • scheduler.py - 调度器(分配、轮换、创建)
  • account.py - 账号管理(选择、cooldown、评分)
  • hf.py - HuggingFace API 封装

测试

# 运行所有测试
pytest tests/ -v

# 运行特定模块
pytest tests/test_state.py -v
pytest tests/test_worker.py -v
pytest tests/test_scheduler.py -v

文档

配置

Redis

export HFS_REDIS_URL="redis://:password@host:port/db"

HuggingFace 账号

admin/init.py 中配置账号列表:

ACCOUNTS = [
    {'username': 'user1', 'token': 'hf_xxx'},
    {'username': 'user2', 'token': 'hf_yyy'}
]

开发

测试驱动开发

  1. 先写测试(tests/test_*.py
  2. 再实现功能(hfs/*.py
  3. 运行测试验证

代码结构

v2/
├── hfs/                # Worker 包
│   ├── state.py        # 状态机
│   ├── health.py       # 健康检查
│   ├── policy.py       # 策略配置
│   ├── worker.py       # Worker 主循环
│   ├── scheduler.py    # 调度器
│   ├── account.py      # 账号管理
│   └── hf.py           # HF API
├── admin/              # Admin 工具
│   ├── cli.py          # 命令行工具
│   └── init.py         # 快速初始化
├── tests/              # 测试
└── docs/               # 文档

许可

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mp2_worker-0.1.3.tar.gz (35.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mp2_worker-0.1.3-py3-none-any.whl (33.6 kB view details)

Uploaded Python 3

File details

Details for the file mp2_worker-0.1.3.tar.gz.

File metadata

  • Download URL: mp2_worker-0.1.3.tar.gz
  • Upload date:
  • Size: 35.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for mp2_worker-0.1.3.tar.gz
Algorithm Hash digest
SHA256 ef4493a433340dac938a4ae1bce9f36d504e12e7e7b6b9e2206a1ec0dc1c4efe
MD5 d58a03ffc38b71f968e2deee93ac7c8e
BLAKE2b-256 719b538d53ad2601e3ebcc5cef50cd0b96fc3f548c19a91c851aa38e5bf29ca4

See more details on using hashes here.

File details

Details for the file mp2_worker-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: mp2_worker-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 33.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for mp2_worker-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 5468d7fa67d561222a52854d0437eb6da525d4639b46ea21d9c51f1897cdbed8
MD5 2ffc1fbd76d24cf178b899407ce4409c
BLAKE2b-256 384681b461ae564c02221782e96eb6dc7021a37a4585c5d49cb14b4497a3aa4f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page