Skip to main content

Distributed worker for HuggingFace Space scheduling

Project description

HFS v2 - HuggingFace Space Worker 分布式调度系统

基于 Redis 的分布式 Worker 调度系统,用于管理 HuggingFace Space 资源池。

特性

  • 分布式调度 - 多 Worker 并发,自动负载均衡
  • 状态管理 - 原子操作(Lua 脚本),保证一致性
  • 健康检查 - 自动检测崩溃、超时、孤儿资源
  • 账号管理 - 多账号池,自动选择、cooldown、评分
  • Space 轮换 - 自动创建、绑定、轮换、复用
  • Admin CLI - 命令行管理工具

快速开始

1. 安装依赖

pip install redis huggingface-hub click tabulate

2. 初始化系统

cd v2
python admin/init.py

3. 查看状态

python admin/cli.py --redis-url="redis://..." list-nodes
python admin/cli.py --redis-url="redis://..." list-accounts

4. 运行 Worker

python -m hfs --redis-url="redis://..." --space-id=my-space --project-id=demo --node-id=node-1

架构

┌─────────────┐
│   Redis     │  ← 状态存储
└──────┬──────┘
       │
   ┌───┴────┐
   │        │
┌──▼──┐  ┌─▼───┐
│Worker│  │Worker│  ← 独立进程
└──┬──┘  └─┬───┘
   │       │
┌──▼───────▼──┐
│  Scheduler  │  ← 调度器
└─────────────┘

核心模块

  • state.py - 状态机 + 原子操作(Lua 脚本)
  • health.py - 健康检查(崩溃检测、一致性验证)
  • policy.py - 策略配置(场景、命名)
  • worker.py - Worker 主循环(心跳、进程管理)
  • scheduler.py - 调度器(分配、轮换、创建)
  • account.py - 账号管理(选择、cooldown、评分)
  • hf.py - HuggingFace API 封装

测试

# 运行所有测试
pytest tests/ -v

# 运行特定模块
pytest tests/test_state.py -v
pytest tests/test_worker.py -v
pytest tests/test_scheduler.py -v

文档

配置

Redis

export HFS_REDIS_URL="redis://:password@host:port/db"

HuggingFace 账号

admin/init.py 中配置账号列表:

ACCOUNTS = [
    {'username': 'user1', 'token': 'hf_xxx'},
    {'username': 'user2', 'token': 'hf_yyy'}
]

开发

测试驱动开发

  1. 先写测试(tests/test_*.py
  2. 再实现功能(hfs/*.py
  3. 运行测试验证

代码结构

v2/
├── hfs/                # Worker 包
│   ├── state.py        # 状态机
│   ├── health.py       # 健康检查
│   ├── policy.py       # 策略配置
│   ├── worker.py       # Worker 主循环
│   ├── scheduler.py    # 调度器
│   ├── account.py      # 账号管理
│   └── hf.py           # HF API
├── admin/              # Admin 工具
│   ├── cli.py          # 命令行工具
│   └── init.py         # 快速初始化
├── tests/              # 测试
└── docs/               # 文档

许可

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mp2_worker-0.1.4.tar.gz (35.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mp2_worker-0.1.4-py3-none-any.whl (33.6 kB view details)

Uploaded Python 3

File details

Details for the file mp2_worker-0.1.4.tar.gz.

File metadata

  • Download URL: mp2_worker-0.1.4.tar.gz
  • Upload date:
  • Size: 35.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for mp2_worker-0.1.4.tar.gz
Algorithm Hash digest
SHA256 0340fce1b65e8339c0f43687ad0fa00af0c28c1b8c5dc72258e9935b6f775e28
MD5 88055d92a06f73c27d4a15688354d7cf
BLAKE2b-256 e039c4a37554e6fdf8c08fe72f4ce174ac6964fb0e2df393da1ba97106d8f750

See more details on using hashes here.

File details

Details for the file mp2_worker-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: mp2_worker-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 33.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for mp2_worker-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 40ddf2384897c77f3c54c6336c39fce60a84cccf36093bfa86ffecf4e65453b5
MD5 151567d6802cd7a534bf847bfca56870
BLAKE2b-256 3008970f2956a75f58f508a0e9b2df84afe2b54b8d6e202c5334e033f91522f6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page