Skip to main content

Distributed worker for HuggingFace Space scheduling

Project description

HFS v2 - HuggingFace Space Worker 分布式调度系统

基于 Redis 的分布式 Worker 调度系统,用于管理 HuggingFace Space 资源池。

特性

  • 分布式调度 - 多 Worker 并发,自动负载均衡
  • 状态管理 - 原子操作(Lua 脚本),保证一致性
  • 健康检查 - 自动检测崩溃、超时、孤儿资源
  • 账号管理 - 多账号池,自动选择、cooldown、评分
  • Space 轮换 - 自动创建、绑定、轮换、复用
  • Admin CLI - 命令行管理工具

快速开始

1. 安装依赖

pip install redis huggingface-hub click tabulate

2. 初始化系统

cd v2
python admin/init.py

3. 查看状态

python admin/cli.py --redis-url="redis://..." list-nodes
python admin/cli.py --redis-url="redis://..." list-accounts

4. 运行 Worker

python -m hfs --redis-url="redis://..." --space-id=my-space --project-id=demo --node-id=node-1

架构

┌─────────────┐
│   Redis     │  ← 状态存储
└──────┬──────┘
       │
   ┌───┴────┐
   │        │
┌──▼──┐  ┌─▼───┐
│Worker│  │Worker│  ← 独立进程
└──┬──┘  └─┬───┘
   │       │
┌──▼───────▼──┐
│  Scheduler  │  ← 调度器
└─────────────┘

核心模块

  • state.py - 状态机 + 原子操作(Lua 脚本)
  • health.py - 健康检查(崩溃检测、一致性验证)
  • policy.py - 策略配置(场景、命名)
  • worker.py - Worker 主循环(心跳、进程管理)
  • scheduler.py - 调度器(分配、轮换、创建)
  • account.py - 账号管理(选择、cooldown、评分)
  • hf.py - HuggingFace API 封装

测试

# 运行所有测试
pytest tests/ -v

# 运行特定模块
pytest tests/test_state.py -v
pytest tests/test_worker.py -v
pytest tests/test_scheduler.py -v

文档

配置

Redis

export HFS_REDIS_URL="redis://:password@host:port/db"

HuggingFace 账号

admin/init.py 中配置账号列表:

ACCOUNTS = [
    {'username': 'user1', 'token': 'hf_xxx'},
    {'username': 'user2', 'token': 'hf_yyy'}
]

开发

测试驱动开发

  1. 先写测试(tests/test_*.py
  2. 再实现功能(hfs/*.py
  3. 运行测试验证

代码结构

v2/
├── hfs/                # Worker 包
│   ├── state.py        # 状态机
│   ├── health.py       # 健康检查
│   ├── policy.py       # 策略配置
│   ├── worker.py       # Worker 主循环
│   ├── scheduler.py    # 调度器
│   ├── account.py      # 账号管理
│   └── hf.py           # HF API
├── admin/              # Admin 工具
│   ├── cli.py          # 命令行工具
│   └── init.py         # 快速初始化
├── tests/              # 测试
└── docs/               # 文档

许可

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mp2_worker-0.1.0.tar.gz (34.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mp2_worker-0.1.0-py3-none-any.whl (33.3 kB view details)

Uploaded Python 3

File details

Details for the file mp2_worker-0.1.0.tar.gz.

File metadata

  • Download URL: mp2_worker-0.1.0.tar.gz
  • Upload date:
  • Size: 34.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for mp2_worker-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f957d66a75e584ccc370b5590d152f84b60bef1bfe65c76f6b3183665b05cea2
MD5 15971c008584b7c7fe97532c1ed8e2ab
BLAKE2b-256 ff5175616a056eeac7e1dc8e4b5712e8450bc7c3953192bae52b6eff78787c59

See more details on using hashes here.

File details

Details for the file mp2_worker-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mp2_worker-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 33.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for mp2_worker-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6705728823c67e8c79b26ab4f4d199d59a44050f30156ef047355499e53ff996
MD5 ab9d8c2643d759c1781c72ee43e3a910
BLAKE2b-256 0d1026e5fdb07f4405d45b20783e0addbb97528dec7e8c7a2e8af73e7e6a0f2a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page