Skip to main content

HuggingFace Space Worker 分布式调度系统

Project description

HFS v2 - HuggingFace Space Worker 分布式调度系统

基于 Redis 的分布式 Worker 调度系统,用于管理 HuggingFace Space 资源池。

特性

  • 分布式调度 - 多 Worker 并发,自动负载均衡
  • 状态管理 - 原子操作(Lua 脚本),保证一致性
  • 健康检查 - 自动检测崩溃、超时、孤儿资源
  • 账号管理 - 多账号池,自动选择、cooldown、评分
  • Space 轮换 - 自动创建、绑定、轮换、复用
  • 场景配置 - 内置多种场景,支持自定义
  • Admin CLI - 命令行管理工具

快速开始

1. 安装

pip install mp-hfs

2. 配置环境(可选)

CLI 已内置默认 Redis URL,可直接使用。如需自定义:

export HFS_REDIS_URL="redis://:password@host:port/db"

3. 初始化账号池

# 添加账号
hfs-admin account create "hf_xxxxx" --max-spaces=6

# 查看账号
hfs-admin account list

参考 使用指南

4. 创建项目

创建 my-project.yaml

project:
  id: "my-project"
  scene: "production"
  required_nodes: 3
  
  start_script:
    type: inline
    inline: "python -m my_app"

nodes:
  ids: ["node-1", "node-2", "node-3"]
  # 或不指定,自动生成:my-project-1, my-project-2, my-project-3

初始化:

hfs-admin project init my-project.yaml
hfs-admin project bootstrap my-project

5. 监控

# 查看节点状态
hfs-admin node list --project my-project

# 查看 Space 状态
hfs-admin space list --project my-project

# 健康检查
hfs-admin health check

场景配置

场景 运行超时 部署间隔 适用场景
dev_test 5分钟 30秒 开发测试
short_task 30分钟 3分钟 短任务
long_task 1小时 5分钟 长任务
production 6-10小时随机 10分钟 生产环境(默认)

详见 场景配置文档

文档

用户文档

设计文档

开发文档

测试文档

架构

┌─────────────┐
│   Redis     │  ← 状态存储
└──────┬──────┘
       │
   ┌───┴────┐
   │        │
┌──▼──┐  ┌─▼───┐
│Worker│  │Worker│  ← 独立进程
└──┬──┘  └─┬───┘
   │       │
┌──▼───────▼──┐
│  Scheduler  │  ← 调度器
└─────────────┘

核心模块

  • state.py - 状态机 + 原子操作(Lua 脚本)
  • health.py - 健康检查(崩溃检测、一致性验证)
  • policy.py - 策略配置(场景、命名)
  • worker.py - Worker 主循环(心跳、进程管理)
  • scheduler.py - 调度器(分配、轮换、创建)
  • account.py - 账号管理(选择、cooldown、评分)
  • hf.py - HuggingFace API 封装

开发

本地开发

git clone <repo>
cd v2
pip install -e .
pytest tests/ -v

测试

# 运行所有测试
pytest tests/ -v

# 运行特定模块
pytest tests/test_state.py -v
pytest tests/test_worker.py -v
pytest tests/test_scheduler.py -v

许可

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mp_hfs-0.1.18.tar.gz (46.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mp_hfs-0.1.18-py3-none-any.whl (54.8 kB view details)

Uploaded Python 3

File details

Details for the file mp_hfs-0.1.18.tar.gz.

File metadata

  • Download URL: mp_hfs-0.1.18.tar.gz
  • Upload date:
  • Size: 46.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for mp_hfs-0.1.18.tar.gz
Algorithm Hash digest
SHA256 3fbc4a65b3b84d4c92f5d69f83df4eff2b308e2f83f04df0a69e374a901010d6
MD5 af6b20e02ab85a0f0f208c5a2bb83f55
BLAKE2b-256 d28ecbf36dea5314ce0160ac39e98306ba6c85c6a0db3de21c15b0aa32466cee

See more details on using hashes here.

File details

Details for the file mp_hfs-0.1.18-py3-none-any.whl.

File metadata

  • Download URL: mp_hfs-0.1.18-py3-none-any.whl
  • Upload date:
  • Size: 54.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for mp_hfs-0.1.18-py3-none-any.whl
Algorithm Hash digest
SHA256 f9ed6e4472151c4bd2c2699a1a861bb83aac8ac9641c7000a860d0f8485caa09
MD5 fc1a1423a17bee251579ce6cda658bd0
BLAKE2b-256 fd7b9d5d780dfadca4ffa08f98a1b78921f2b9d316e840154a0c63a1f9642b4b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page