Skip to main content

Distributed worker for HuggingFace Space scheduling

Project description

HFS v2 - HuggingFace Space Worker 分布式调度系统

基于 Redis 的分布式 Worker 调度系统,用于管理 HuggingFace Space 资源池。

特性

  • 分布式调度 - 多 Worker 并发,自动负载均衡
  • 状态管理 - 原子操作(Lua 脚本),保证一致性
  • 健康检查 - 自动检测崩溃、超时、孤儿资源
  • 账号管理 - 多账号池,自动选择、cooldown、评分
  • Space 轮换 - 自动创建、绑定、轮换、复用
  • Admin CLI - 命令行管理工具

快速开始

1. 安装依赖

pip install redis huggingface-hub click tabulate

2. 初始化系统

cd v2
python admin/init.py

3. 查看状态

python admin/cli.py --redis-url="redis://..." list-nodes
python admin/cli.py --redis-url="redis://..." list-accounts

4. 运行 Worker

python -m hfs --redis-url="redis://..." --space-id=my-space --project-id=demo --node-id=node-1

架构

┌─────────────┐
│   Redis     │  ← 状态存储
└──────┬──────┘
       │
   ┌───┴────┐
   │        │
┌──▼──┐  ┌─▼───┐
│Worker│  │Worker│  ← 独立进程
└──┬──┘  └─┬───┘
   │       │
┌──▼───────▼──┐
│  Scheduler  │  ← 调度器
└─────────────┘

核心模块

  • state.py - 状态机 + 原子操作(Lua 脚本)
  • health.py - 健康检查(崩溃检测、一致性验证)
  • policy.py - 策略配置(场景、命名)
  • worker.py - Worker 主循环(心跳、进程管理)
  • scheduler.py - 调度器(分配、轮换、创建)
  • account.py - 账号管理(选择、cooldown、评分)
  • hf.py - HuggingFace API 封装

测试

# 运行所有测试
pytest tests/ -v

# 运行特定模块
pytest tests/test_state.py -v
pytest tests/test_worker.py -v
pytest tests/test_scheduler.py -v

文档

配置

Redis

export HFS_REDIS_URL="redis://:password@host:port/db"

HuggingFace 账号

admin/init.py 中配置账号列表:

ACCOUNTS = [
    {'username': 'user1', 'token': 'hf_xxx'},
    {'username': 'user2', 'token': 'hf_yyy'}
]

开发

测试驱动开发

  1. 先写测试(tests/test_*.py
  2. 再实现功能(hfs/*.py
  3. 运行测试验证

代码结构

v2/
├── hfs/                # Worker 包
│   ├── state.py        # 状态机
│   ├── health.py       # 健康检查
│   ├── policy.py       # 策略配置
│   ├── worker.py       # Worker 主循环
│   ├── scheduler.py    # 调度器
│   ├── account.py      # 账号管理
│   └── hf.py           # HF API
├── admin/              # Admin 工具
│   ├── cli.py          # 命令行工具
│   └── init.py         # 快速初始化
├── tests/              # 测试
└── docs/               # 文档

许可

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hfs_worker-0.1.0.tar.gz (34.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hfs_worker-0.1.0-py3-none-any.whl (33.3 kB view details)

Uploaded Python 3

File details

Details for the file hfs_worker-0.1.0.tar.gz.

File metadata

  • Download URL: hfs_worker-0.1.0.tar.gz
  • Upload date:
  • Size: 34.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for hfs_worker-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cdfbf83e57090326cac31a64d044837996b91a512c55f80b56a0bc31d8271998
MD5 e6c3162b1e77582368c20b792b0c1dd7
BLAKE2b-256 0e43a2e76d0e63001fc00060d7d52dc8820770c6a9e0833d1c9a25bb5e5526fa

See more details on using hashes here.

File details

Details for the file hfs_worker-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: hfs_worker-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 33.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for hfs_worker-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bc69a5a5992f487bf422447ea2f950342a66853b5f016074f9a4b3bc9e66b9ef
MD5 e1f87aae39511792e9208b2728454d05
BLAKE2b-256 f61f7c5acdb64624a79528b291de4f3562ebde402d2c83c76b17b535b6067b14

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page