Skip to main content

FlowHive User Server Agent

Project description

User Server Agent(计算服务器端)

职责

注册到 Control Server,维持心跳;监听下行指令、调度任务、执行脚本;实时上报 GPU 指标、任务日志与结果。

目录结构

  • core/:Agent 核心逻辑(任务模型、异步执行器、任务管理器、GPU 监控器)
  • cli/:CLI 工具(主入口 flowhive.py
  • config/:预留的配置文件目录
  • scripts/:辅助脚本(如 demo_task_manager.py 用于本地测试)
  • tests/:Pytest 用例

本地开发

创建环境并安装依赖:

cd agent
py -m venv .venv
.\.venv\Scripts\activate
pip install -r requirements.txt

或者直接安装(如果已发布到 PyPI):

pip install flowhive-agent -i https://pypi.org/simple/

手动测试 TaskManager

如需快速体验任务执行链路,可以直接运行示例脚本:

cd agent
python scripts/demo_task_manager.py

脚本默认会提交一个简单的 Python 命令。也可通过传参覆盖:

python scripts/demo_task_manager.py "python -c \"print('hi')\"" --timeout 10

执行完成后,可在 agent_logs/ 目录查看 stdout/stderr。

连接 Control Server

Agent 现在通过 WebSocket 同 Control Server 通讯。完成依赖安装(pip install websockets),然后运行:

cd agent
# 1. 设置账户信息
python cli/flowhive.py config user.username "test"
python cli/flowhive.py config user.email "user@example.com"
python cli/flowhive.py config user.password "your-password"
python cli/flowhive.py config control_base_url "http://127.0.0.1:8001"
python cli/flowhive.py config label "your-label"

# 2. 验证配置
python cli/flowhive.py config

# 3. 启动 Agent
python cli/flowhive.py run

脚本会自动根据 control_base_url 的协议自动转换:

  • http://ws://(明文 WebSocket)
  • https://wss://(加密 WebSocket,推荐生产环境使用)

生产环境配置示例

# 使用 HTTPS/WSS(推荐)
python cli/flowhive.py config control_base_url "https://your-control-server.com"

Agent 会自动使用 wss:// 连接到 Control Server,确保通信安全。

关键模块

  • GPU 监控器 & ServiceGPUMonitor 基于 NVML 采集显存、利用率与进程显存占用,GPUService 作为单例门面被 TaskManager/控制面复用,确保任务管理器只负责注入监控引用、不直接承担指标查询职责。
  • FlowHive Scheduler:复用现有显存阈值、优先级、最大并发、任务重试等能力。
  • 任务执行器subprocess / torchrun / bash,支持环境变量注入、容器启动。
  • OOM 自愈 & 重试:根据任务状态机,将失败任务重新入队或降级排队。
  • 日志/指标上报:多路复用通道,将 stdout/stderr 流式推送至 Control Server。
  • 心跳服务:1s~5s 周期报告 GPU 拓扑、进程、Agent 版本。

License

This Agent component is licensed under the MIT License. See LICENSE for details.

Note: This is only the Agent component. The Control Server and Web Client are proprietary software and not covered by this license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flowhive_agent-0.1.1.tar.gz (28.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flowhive_agent-0.1.1-py3-none-any.whl (28.4 kB view details)

Uploaded Python 3

File details

Details for the file flowhive_agent-0.1.1.tar.gz.

File metadata

  • Download URL: flowhive_agent-0.1.1.tar.gz
  • Upload date:
  • Size: 28.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for flowhive_agent-0.1.1.tar.gz
Algorithm Hash digest
SHA256 ad2c23e8c3dcd3dc9fdb6693bc422e191e4f091c87f452f464241e7369b660db
MD5 47ee7c5f8dca01a06bc45c7f56d8b388
BLAKE2b-256 1c5617c169526493ff7ebf970c59d2cbd06c70b22379c2c3deee16d687487e5a

See more details on using hashes here.

File details

Details for the file flowhive_agent-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: flowhive_agent-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 28.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for flowhive_agent-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c417e22794d83e47889801559e794250162cd06a92919b89f8ff5519c51c75f2
MD5 a58bb2ed95c2e54a0ad13578a772fdf2
BLAKE2b-256 112259cadf3095545e2a69798a100b756e04dde7030240838138056fdccf75a3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page