Skip to main content

FlowHive User Server Agent

Project description

User Server Agent(计算服务器端)

职责

注册到 Control Server,维持心跳;监听下行指令、调度任务、执行脚本;实时上报 GPU 指标、任务日志与结果。

目录结构

  • core/:Agent 核心逻辑(任务模型、异步执行器、任务管理器、GPU 监控器)
  • cli/:CLI 工具(主入口 flowhive.py
  • config/:预留的配置文件目录
  • scripts/:辅助脚本(如 demo_task_manager.py 用于本地测试)
  • tests/:Pytest 用例

本地开发

创建环境并安装依赖:

cd agent
py -m venv .venv
.\.venv\Scripts\activate
pip install -r requirements.txt

或者直接安装(如果已发布到 PyPI):

pip install flowhive

手动测试 TaskManager

如需快速体验任务执行链路,可以直接运行示例脚本:

cd agent
python scripts/demo_task_manager.py

脚本默认会提交一个简单的 Python 命令。也可通过传参覆盖:

python scripts/demo_task_manager.py "python -c \"print('hi')\"" --timeout 10

执行完成后,可在 agent_logs/ 目录查看 stdout/stderr。

连接 Control Server

Agent 现在通过 WebSocket 同 Control Server 通讯。完成依赖安装(pip install websockets),然后运行:

cd agent
# 1. 设置账户信息
python cli/flowhive.py config user.username "test"
python cli/flowhive.py config user.email "user@example.com"
python cli/flowhive.py config user.password "your-password"
python cli/flowhive.py config control_base_url "http://127.0.0.1:8001"
python cli/flowhive.py config label "your-label"

# 2. 验证配置
python cli/flowhive.py config

# 3. 启动 Agent
python cli/flowhive.py run

脚本会自动根据 control_base_url 的协议自动转换:

  • http://ws://(明文 WebSocket)
  • https://wss://(加密 WebSocket,推荐生产环境使用)

生产环境配置示例

# 使用 HTTPS/WSS(推荐)
python cli/flowhive.py config control_base_url "https://your-control-server.com"

Agent 会自动使用 wss:// 连接到 Control Server,确保通信安全。

关键模块

  • GPU 监控器 & ServiceGPUMonitor 基于 NVML 采集显存、利用率与进程显存占用,GPUService 作为单例门面被 TaskManager/控制面复用,确保任务管理器只负责注入监控引用、不直接承担指标查询职责。
  • FlowHive Scheduler:复用现有显存阈值、优先级、最大并发、任务重试等能力。
  • 任务执行器subprocess / torchrun / bash,支持环境变量注入、容器启动。
  • OOM 自愈 & 重试:根据任务状态机,将失败任务重新入队或降级排队。
  • 日志/指标上报:多路复用通道,将 stdout/stderr 流式推送至 Control Server。
  • 心跳服务:1s~5s 周期报告 GPU 拓扑、进程、Agent 版本。

License

This Agent component is licensed under the MIT License. See LICENSE for details.

Note: This is only the Agent component. The Control Server and Web Client are proprietary software and not covered by this license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flowhive_agent-0.1.0.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flowhive_agent-0.1.0-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file flowhive_agent-0.1.0.tar.gz.

File metadata

  • Download URL: flowhive_agent-0.1.0.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for flowhive_agent-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a784fc0221cb1393ee0a0a2adbac89a265b7a52d57453f8d921dcb1f542f3424
MD5 b724437ae48561b4b21dd600abe116ea
BLAKE2b-256 657fb09e6bd492b967dbb954ac7a843e49d8a23a583012d13faf2ffd45f17733

See more details on using hashes here.

File details

Details for the file flowhive_agent-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: flowhive_agent-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for flowhive_agent-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0479f5ff615328cb94b27271d57f844b52b820d1a4dd7cc20985cb1b31abd163
MD5 ad639a51c810db2b044e102a2f740b12
BLAKE2b-256 b9c14684fced157e2f06fe0993d4b53f3b5a4b49652907a18359d4f994bf0803

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page