Skip to main content

[QReward] RewardService Python Client

Project description

QReward

AQ ✖️️ Reward = QReward

Github Actions Status Coverage PyPI version PyPI - Python Version GitHub repo size

PyPI - Format Contributions welcome License

中文版本

📣 Introduction & Background

This feature is designed to address the compute capacity shortage and concurrency rate-limiting issues in the current RL reward process. By integrating multiple cloud compute services and combining intelligent scheduling with request optimization strategies, it maximizes the utilization of computing resources and significantly reduces task execution time. The system automatically determines the request distribution method based on real-time compute availability, rate-limit thresholds, and task priorities, thereby avoiding unnecessary backoff delays and improving overall throughput.

There are three main causes for the latency issue in the current RL reward process:

  1. Python concurrent requests triggering rate-limit failures

    • Excessive concurrency leads to hitting the rate limits of the compute service.
    • Once rate limiting occurs, the client applies a backoff strategy, reducing the number of active requests.
    • As a result, the available compute capacity of the Model Cloud Service is not fully utilized, causing potential resource underuse.
  2. Insufficient Model Cloud Service compute capacity

    • The Model Cloud Service alone cannot meet the total compute demand, resulting in increased task queuing and processing delays.
    • The solution involves introducing additional compute services to supplement capacity and designing an appropriate scheduling strategy to dynamically and efficiently distribute tasks among multiple compute resources, thereby alleviating compute bottlenecks.
  3. Non-optimal task execution flow with unnecessary serialization

    • Some subtasks within the RL reward process could be executed in parallel, but the current implementation runs them sequentially, causing increased total latency.
    • Lack of asynchronous or pipeline optimization results in inefficient mixing of I/O waits and computation.

✨ Features

Beyond supporting Verl and Slime, the solution also provides acceleration capabilities for general-purpose functions.

  1. HTTP Call Optimization

    • Connection reuse: Reduce handshake latency and frequent reconnections using HTTP Keep-Alive or connection pooling.
    • Batch requests: Aggregate multiple small requests into batch calls to reduce request frequency and network overhead.
    • Concurrency control: Intelligently adjust the level of concurrency to avoid hitting rate limits of the Model Cloud Service while maintaining high utilization.
  2. Intelligent Retry Mechanism

    • Error-type-based retry: Quickly retry recoverable errors (e.g., timeouts, temporary network failures) while avoiding retries for non-recoverable errors to save resources.
    • Optimized exponential backoff: Integrate compute utilization monitoring into backoff intervals, dynamically deciding wait times to prevent prolonged idle resources.
    • Multi-source retry: Redirect retries to other available compute services to avoid single-service bottlenecks.
  3. Multi-compute Scheduling(Coming soon👀)

    • Integrate additional compute resources beyond the Model Cloud Service into a unified compute pool.
    • Optimize distribution based on task priority, latency sensitivity, and load balancing.

📒 ChangeLog

CHANGELOG.md

🔰 Installation

pip install

pip install qreward

from source code

# normal way to install from source code
$ git clone https://github.com/AQ-MedAI/QReward.git
$ cd QReward
$ pip install -r requirements.txt
$ python setup.py install

# or you can use make file
$ make install

📝 Usage

Pure Acceleration

  • Single Call — Basic OpenAI proxy usage (single request, context manager, proxy manager)
  • Batch Call — Batch chat completion and batch embedding calls

Schedule Decorator

Feature Example Key Parameters
Sync Function schedule_sync.py retry_times
Debug Logging schedule_debug.py debug=True
Timeout schedule_timeout.py timeout (wall-clock deadline in seconds)
Rate Limiting schedule_limit.py limit_size, key_func
Retry & Speed-up schedule_retry.py retry_times, exception_types, retry_interval
Default Value schedule_default_value.py default_result (value, None, or callable)
Hedged Request schedule_hedged_request.py hedged_request_time, hedged_request_max_times
Circuit Breaker schedule_circuit_breaker.py circuit_breaker_threshold, circuit_breaker_recovery
Adaptive Limiting schedule_adaptive_limit.py adaptive_limit=True, adaptive_error_threshold
Metrics Callback schedule_metrics_callback.py metrics_callback
Priority Queue schedule_priority.py priority (HIGH / NORMAL / LOW)
OpenTelemetry schedule_telemetry.py telemetry_exporter
Config Hot Reload schedule_config_hot_reload.py ScheduleConfig, ConfigWatcher
Combined Features schedule_combined.py All features working together

Client (Multi-Source Scheduling)

Feature Example Key Concepts
Load Balancer client_load_balancer.py ROUND_ROBIN, WEIGHTED_ROUND_ROBIN, mark_unhealthy, failover
Model Router client_model_router.py register_model_route, glob patterns (gpt-*), per-group strategy
Streaming client_streaming.py stream_chat_completion, token-by-token output
Batch Streaming client_batch_streaming.py batch_stream_chat_completion, max_concurrent_streams, on_stream_error

Framework Integration

  • With ROLL Framework: Examples — LLM-as-Judge reward via remote API with load balancing
  • With verl Framework: Examples
  • With slime Framework: Examples

⛏ Code Quality

Unit Tests

$ pip install -r tests/requirements.txt
$ make

😉 Authors

QReward is primarily developed and maintained by the following developers:

For more contributor information, please visit QReward/graphs/contributors

💡 Contributing

We look forward to more developers participating in the development of QReward. We will ensure prompt review of PRs and timely responses. However, when submitting a PR, please ensure:

  1. Pass all unit tests; if it's a new feature, please add corresponding unit tests
  2. Follow development guidelines, format code using black and flake8 ($ pip install -r requirements-dev.txt)
  3. Update corresponding documentation if necessary

📃 License

Apache 2.0 ©AQ-MedAI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qreward-0.1.7.tar.gz (103.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qreward-0.1.7-py3-none-any.whl (114.7 kB view details)

Uploaded Python 3

File details

Details for the file qreward-0.1.7.tar.gz.

File metadata

  • Download URL: qreward-0.1.7.tar.gz
  • Upload date:
  • Size: 103.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for qreward-0.1.7.tar.gz
Algorithm Hash digest
SHA256 f65a541c2cff38a6b19fa8c8c854e52902c6f49af624fda37a5e9c88d75d81ca
MD5 66d907af28102ed377468a729455b0c5
BLAKE2b-256 2f442f483081c18480a16392efcc2178817ebc907dfa33dc42890c100c7f2dbc

See more details on using hashes here.

File details

Details for the file qreward-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: qreward-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 114.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for qreward-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 02c60dfdd9a77f5ec84def9be0660ee769595dd3567cdb0adb51bf37c27e81b6
MD5 27bc718e70a3702b874a74fba4eac4f4
BLAKE2b-256 95d1b15f7aa9a695a5b3665b12425cd58b6b330de862c35c88a6125917806525

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page