Backend.AI Manager

These details have been verified by PyPI

Project links

Owner

Lablup Inc.

GitHub Statistics

Maintainers

joongi

These details have not been verified by PyPI

Project links

Documentation

Project description

Backend.AI Manager

← Back to Architecture Overview

Purpose

The Manager is the central orchestrator of the Backend.AI cluster. It schedules computing sessions (sessions and kernels), allocates resources, manages the lifecycle of sessions, and provides API gateway functionality through REST and GraphQL interfaces.

Key Responsibilities

1. API Gateway

Provide REST API and GraphQL API to clients
Request authentication and authorization
Rate limiting and quota management
API versioning management

2. Session Scheduling

Allocate computing resources for user compute session requests
Select optimal agents based on various scheduling algorithms
Manage session lifecycle (creation, execution, termination)
Cluster-mode session scheduling and management

3. Resource Management

Track cluster-wide resource status
Collect and aggregate agent resource information
Manage resource allocation and release
Handle resource presets and quotas

4. User and Organization Management

User account and authentication management
Group and domain organization management
Permission and role-based access control
Credential (access key/secret key) management

5. Virtual Folder Management

Provide persistent storage to users
Integrate various storage backends
Manage file upload/download
Manage folder sharing and permissions

6. Image Registry Management

Manage container image repositories
Scan and synchronize image metadata
Validate and manage image versions
Control allowed images per domain

Entry Points

The Manager accepts and processes external requests through 4 entry points.

1. REST API

Framework: aiohttp (async HTTP web framework)

Location: src/ai/backend/manager/api/

Key Features:

HTTP/HTTPS-based communication
JSON request/response format
JWT or API Key-based authentication
Validation and authentication via middleware stack

Processing Flow:

HTTP Request → Middleware Stack → REST Handler → Action Processor → Service → Repository → DB

Related Documentation: REST API Documentation

2. GraphQL API

Framework: Strawberry (current) + Graphene (Legacy, DEPRECATED)

Location:

Strawberry: src/ai/backend/manager/api/gql/
Graphene (Legacy): src/ai/backend/manager/api/gql_legacy/

Related Documentation:

GraphQL API (Strawberry)
Legacy GraphQL (Graphene) - DEPRECATED

3. Event Dispatcher

Framework: Backend.AI Event Dispatcher

Location: src/ai/backend/common/events/

Event Types:

Broadcast Events: Received by all Manager instances
Anycast Events: Received by only one Manager instance

Processing Flow:

Event Producer → Event Dispatcher → Event Handler → Service

Related Documentation: Event Dispatcher System

4. Background Task Handler

Framework: Backend.AI Background Task Handler

Location: src/ai/backend/common/bgtask/

Purpose: Handles long-running tasks asynchronously. Issues Task IDs that allow clients to subscribe to progress updates or results via events.

Processing Flow:

Task Request → Background Task Handler → Task Execute → Event Notification

Related Documentation: Background Task Handler System

Entry Point Interactions

Each entry point operates independently, but service logic can trigger background tasks or publish events as needed.

Interaction Examples:

REST API Handler → Service Logic → Event Publish (notify other Manager instances)
Event Handler → Service Logic → Background Task Trigger (when async processing needed)

Integrated Architecture:

┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│  REST API   │  │ GraphQL API │  │   Event     │  │ Background  │
│  (aiohttp)  │  │(Strawberry) │  │ Dispatcher  │  │    Task     │
└──────┬──────┘  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘
       │                │                │                │
       │                │                │                │
       └────────────────┴────────────────┴────────────────┘
                              │
                    ┌─────────▼──────────┐
                    │  Services Layer    │
                    └─────────┬──────────┘
                              │
                    ┌─────────▼──────────┐
                    │ Repositories Layer │
                    └────────────────────┘

Architecture

┌───────────────────────────────────────────┐
│              API Layer                    │
│  - REST API Handler (aiohttp)             │
│  - GraphQL Handler (strawberry)           │
│  - Authentication & Authorization         │
│  - Request Validation                     │
└──────────────────┬────────────────────────┘
                   │
┌──────────────────┴────────────────────────┐
│            Actions Layer                  │
│  - Session Lifecycle Actions              │
│  - Resource Allocation Actions            │
│  - User Management Actions                │
│  - VFolder Management Actions             │
└──────────────────┬────────────────────────┘
                   │
┌──────────────────┴────────────────────────┐
│           Services Layer                  │
│  - Scheduling Service                     │
│  - Session Management Service             │
│  - Resource Quota Service                 │
│  - Event Processing Service               │
└──────────────────┬────────────────────────┘
                   │
┌──────────────────┴────────────────────────┐
│         Repositories Layer                │
│  - Session Repository                     │
│  - Agent Repository                       │
│  - User Repository                        │
│  - VFolder Repository                     │
└──────────────────┬────────────────────────┘
                   │
┌──────────────────┴────────────────────────┐
│            Models Layer                   │
│  - SQLAlchemy ORM Models                  │
│  - Domain Types                           │
└───────────────────────────────────────────┘

Directory Structure

manager/
├── models/              # Database schema and ORM models
│   ├── alembic/        # Database migration scripts
│   ├── user.py         # User and credential models
│   ├── session.py      # Session and kernel models
│   ├── agent.py        # Agent models
│   ├── vfolder.py      # Virtual folder models
│   ├── scaling_group.py # Scaling group models
│   └── ...
├── repositories/        # Data access layer
│   ├── session/        # Session data access
│   ├── agent/          # Agent data access
│   ├── user/           # User data access
│   └── ...
├── services/            # Business logic layer
│   ├── session/        # Session lifecycle management
│   └── ...
├── api/                 # API handlers and routes
│   ├── gql/             # GraphQL schema and resolvers
│   ├── auth.py          # Authentication handlers
│   └── ...
├── config/              # Configuration management
├── cli/                 # CLI commands
│   ├── fixture.py       # Test data management
│   └── ...
├── clients/             # External service clients
│   ├── agent/           # Agent RPC client
│   ├── storage_proxy/   # Storage proxy client
│   └── ...
├── scheduler/           # Scheduling algorithms and logic
│   ├── dispatcher.py   # Scheduling dispatcher
│   ├── predicates.py   # Scheduling predicates
│   └── ...
├── server.py            # Main server entry point
└── defs.py              # Shared constants and types

Core Concepts

Sessions

Sessions represent user compute requests and are composed of one or more kernels:

Session ID: Unique identifier for the session
Access Key: Owner's API access key
Status: Session state (PENDING, RUNNING, TERMINATED, etc.)
Resource Allocation: Allocated CPU, memory, GPU resources
Cluster Size: Number of kernels in multi-container mode

Session Lifecycle:

User creates session via API
Manager schedules and allocates resources
Agent creates and starts kernel containers
User executes code and uses services
Session terminates (user request or timeout)
Resources are returned to the pool

Agents

Agents are compute node workers that execute kernels:

Agent ID: Unique identifier for the agent
Status: Agent state (ALIVE, LOST, TERMINATED)
Available Resources: CPU, memory, GPU capacity
Occupied Slots: Currently allocated resources
Scaling Group: Group to which the agent belongs

Agent Monitoring:

Manager periodically checks agent health
Agent heartbeat and resource status updates
Automatic detection of lost agents
Kernel rebalancing during failures

Scaling Groups

Scaling groups manage agent clusters logically:

Group Name: Unique identifier for the scaling group
Scheduler Type: Scheduling algorithm (FIFO, LIFO, DRF, etc.)
Agent Members: Agents belonging to the group
Allowed vFolder Hosts: Permitted storage backends
Resource Limits: Per-group resource limits

Virtual Folders (VFolders)

VFolders provide persistent storage:

Folder Name: User-defined folder name
Host: Storage backend location
Ownership: User or group ownership
Permissions: Read/write permissions
Quota: Storage capacity limit

Infrastructure Dependencies

Required Infrastructure

PostgreSQL (Persistent Data)

Purpose:
- Store all Backend.AI metadata
- User/Group/Domain information
- Session and kernel history
- VFolder metadata
- Resource allocation records
Halfstack Port: 8101 (host) → 5432 (container)
Key Tables:
- users, keypairs - User credentials
- groups, domains - Organization structure
- kernels, sessions - Session information
- agents - Agent status
- vfolders - VFolder metadata
- scaling_groups - Scaling group configuration

Redis (Caching and Real-time Data)

Purpose:
- Cache frequently accessed data
- Distributed locking
- Agent live status tracking
- Session rate limiting
- Temporary session data storage
Halfstack Port: 8111 (host) → 6379 (container)

etcd (Global Configuration)

Purpose:
- Store cluster-wide configuration
- Service discovery
- Agent registration
- Dynamic configuration updates
Halfstack Port: 8121 (host) → 2379 (container)

Optional Infrastructure (Observability)

Prometheus (Metrics Collection)

Purpose:
- API request metrics
- Session scheduling metrics
- Resource usage metrics
- Background task metrics
Internal Port: 18080 (separate from main API port 8091)
Exposed Endpoints:
- http://localhost:18080/metrics - Prometheus metrics endpoint
- http://localhost:18080/metrics/service_discovery - Service discovery endpoint for automated metrics collection configuration
Key Metrics:
- backendai_api_request_count - Total API requests
- backendai_api_request_duration_sec - Request processing time
- backendai_scheduler_enqueue_success - Successful scheduling count
- backendai_agent_registry_count - Number of agents
Note: The service discovery endpoint provides automated configuration for Prometheus to discover all Backend.AI component endpoints

Loki (Log Aggregation)

Purpose:
- Session lifecycle events
- API request/response logs
- Scheduling decision logs
- Error and exception logs
Log Labels:
- component - Component identifier (manager)
- session_id - Session identifier
- user_id - User identifier
- level - Log level (info, warning, error)

Grafana (Visualization)

Purpose:
- Real-time metrics dashboards
- Resource usage visualization
- Session status monitoring
- Alert management

Configuration

See configs/manager/halfstack.conf for configuration file examples.

Key Configuration Items

Database Settings:

PostgreSQL connection string
Connection pool size
Query timeout settings

Redis Settings:

Redis connection information
Connection pool configuration

etcd Settings:

etcd endpoint addresses
Configuration key prefix (namespace)

API Settings:

Listen address and port
CORS configuration

Scheduling Settings:

Default scheduler type
Scheduling interval
Resource allocation policy

Halfstack Configuration

Recommended: Use the ./scripts/install-dev.sh script for development environment setup.

Starting Development Environment

# Setup development environment via script (recommended)
./scripts/install-dev.sh

# Start Manager
./backend.ai mgr start-server

API Access

REST API: http://localhost:8081
GraphQL API: http://localhost:8081/graphql
Admin GraphQL UI: http://localhost:8081/graphql-ui

Metrics and Monitoring

Prometheus Metrics

The Manager component exposes Prometheus metrics at the /metrics endpoint for monitoring system health and performance.

Label Conventions

Many metrics share common labels for error tracking and classification:

Error-related Labels (populated only when errors occur):

domain: Error domain categorizing the error source (e.g., "session", "agent", "storage")
operation: Specific operation that failed (e.g., "create", "terminate", "allocate")
error_detail: Detailed error information for debugging

Status Labels:

status: Operation outcome - typically "success" or "failure"
success: Boolean string ("True" or "False") indicating operation success

API Metrics

REST API request monitoring metrics.

backendai_api_request_count (Counter)

Description: Total number of REST API requests received
Labels:
- method: HTTP method (GET, POST, PUT, DELETE, PATCH)
- endpoint: API endpoint path (e.g., "/v1/session/create")
- domain: Error domain (empty if successful)
- operation: Error operation (empty if successful)
- error_detail: Error details (empty if successful)
- status_code: HTTP response status code (200, 400, 500, etc.)

backendai_api_request_duration_sec (Histogram)

Description: API request processing time in seconds
Labels: Same as backendai_api_request_count

GraphQL Metrics

GraphQL query execution monitoring metrics.

backendai_graphql_request_count (Counter)

Description: Total number of GraphQL queries executed
Labels:
- operation_type: GraphQL operation type (query, mutation, subscription)
- field_name: GraphQL field being accessed
- parent_type: Parent type in the GraphQL schema
- operation_name: Named operation from the query
- domain: Error domain (empty if successful)
- operation: Error operation (empty if successful)
- error_detail: Error details (empty if successful)
- success: "True" or "False" indicating query success

backendai_graphql_request_duration_sec (Histogram)

Description: GraphQL query processing time in seconds
Labels: Same as backendai_graphql_request_count

Event Metrics

Internal event processing metrics.

backendai_event_count (Counter)

Description: Total number of events processed
Labels:
- event_type: Type of event (e.g., "session_terminated", "kernel_started")

backendai_event_failure_count (Counter)

Description: Number of failed event processing attempts
Labels:
- event_type: Type of event that failed
- exception: Exception class name (e.g., "SessionNotFound", "AgentError")
- domain: Error domain
- operation: Error operation
- error_detail: Error details

backendai_event_processing_time_sec (Histogram)

Description: Event processing time in seconds
Labels:
- event_type: Type of event
- status: "success" or "failure"
- domain: Error domain (empty if successful)
- operation: Error operation (empty if successful)
- error_detail: Error details (empty if successful)

Background Task Metrics

Periodic and scheduled task execution metrics.

backendai_bgtask_count (Gauge)

Description: Number of currently running background tasks
Labels:
- task_name: Background task identifier (e.g., "recalc_agent_resource_occupancy")

backendai_bgtask_done_count (Counter)

Description: Total number of completed background tasks
Labels:
- task_name: Background task identifier
- status: "success" or "failure"
- domain: Error domain (empty if successful)
- operation: Error operation (empty if successful)
- error_detail: Error details (empty if successful)

backendai_bgtask_processing_time_sec (Histogram)

Description: Background task execution time in seconds
Labels: Same as backendai_bgtask_done_count

Action Metrics

High-level business operation metrics.

backendai_action_count (Counter)

Description: Total number of actions executed
Labels:
- entity_type: Type of entity being operated on (e.g., "session", "kernel", "agent")
- operation_type: Type of operation (e.g., "create", "terminate", "restart")
- status: "success" or "failure"
- domain: Error domain (empty if successful)
- operation: Error operation (empty if successful)
- error_detail: Error details (empty if successful)
Example: Tracks session creation, termination, and other major lifecycle operations

backendai_action_duration_sec (Histogram)

Description: Action execution time in seconds
Labels: Same as backendai_action_count

Layer Operation Metrics

Granular metrics for operations at each architectural layer.

backendai_layer_operation_triggered_count (Gauge)

Description: Number of layer operations currently in progress
Labels:
- domain: Domain type (valkey, repository, client)
- layer: Specific layer (e.g., "session_repository", "agent_client", "valkey_live")
- operation: Operation name (e.g., "fetch_session", "create_kernel")

backendai_layer_operation_count (Counter)

Description: Total number of layer operations completed
Labels:
- domain: Domain type (valkey, repository, client)
- layer: Specific layer identifier
- operation: Operation name
- success: "True" or "False"

backendai_layer_operation_error_count (Counter)

Description: Total number of layer operation errors
Labels:
- domain: Domain type
- layer: Specific layer identifier
- operation: Operation name
- error_code: Error code or "internal_error"

backendai_layer_retry_count (Counter)

Description: Number of retries for layer operations
Labels:
- domain: Domain type
- layer: Specific layer identifier
- operation: Operation name

backendai_layer_operation_duration_sec (Histogram)

Description: Layer operation execution time in seconds
Labels:
- domain: Domain type
- layer: Specific layer identifier
- operation: Operation name
- success: "True" or "False"

System Metrics

System resource usage metrics.

backendai_async_task_count (Gauge)

Description: Number of active asyncio tasks
Labels: None

backendai_cpu_usage_percent (Gauge)

Description: CPU usage percentage of the Manager process
Labels: None

backendai_memory_used_rss (Gauge)

Description: Resident Set Size (RSS) memory usage in bytes
Labels: None

backendai_memory_used_vms (Gauge)

Description: Virtual Memory Size (VMS) in bytes
Labels: None

Sweeper Metrics

Resource cleanup and garbage collection metrics.

backendai_sweep_session_count (Counter)

Description: Total number of session cleanup operations
Labels:
- status: Session status being cleaned (e.g., "TERMINATED", "ERROR")
- success: "True" or "False"

backendai_sweep_kernel_count (Counter)

Description: Total number of kernel cleanup operations
Labels:
- success: "True" or "False"

Event Propagator Metrics

External event propagation metrics for webhooks and integrations.

backendai_event_propagator_count (Gauge)

Description: Current number of active event propagators
Labels: None

backendai_event_propagator_alias_count (Gauge)

Description: Current number of event propagator aliases
Labels:
- domain: Domain identifier for the alias
- alias_id: Unique identifier for the propagator alias

backendai_event_propagator_registration_count (Counter)

Description: Total number of event propagator registrations
Labels: None

backendai_event_propagator_unregistration_count (Counter)

Description: Total number of event propagator unregistrations
Labels: None

Stage Metrics

Development and debugging metrics for tracking execution stages.

backendai_stage_count (Counter)

Description: Count of stage occurrences for debugging and tracing
Labels:
- stage: Stage identifier
- upper_layer: Calling layer or component

Prometheus Query Examples

The following examples demonstrate common Prometheus queries for Manager metrics. Note that Counter metrics use the _total suffix and Histogram metrics use _bucket, _sum, _count suffixes in actual queries.

Important Notes:

When using increase() or rate() functions, the time range must be at least 2-4x longer than your Prometheus scrape interval to get reliable data. If the time range is too short, metrics may not appear or show incomplete data.
Default Prometheus scrape interval is typically 15s-30s
Time range selection trade-offs:
- Shorter ranges (e.g., [1m]): Detect changes faster with more granular data, but more sensitive to noise and short-term fluctuations
- Longer ranges (e.g., [5m]): Smoother graphs with reduced noise, better for identifying trends, but slower to detect sudden changes
- For real-time alerting: Use shorter ranges like [1m] or [2m]
- For dashboards and trend analysis: Use longer ranges like [5m] or [10m]

API Request Rate

API Request Rate by Endpoint

Calculate the per-second rate of API requests grouped by endpoint and status. This shows how many requests per second each endpoint receives. Use this to identify high-traffic endpoints and monitor overall API load.

sum(rate(backendai_api_request_count_total{service_group="$service_groups"}[1m])) by (method, endpoint, status_code)

Failed API Requests (5xx Errors)

Monitor failed API requests (5xx errors) to identify service issues. This helps detect when the Manager is experiencing internal errors.

sum(rate(backendai_api_request_count_total{service_group="$service_groups", status_code=~"5.."}[5m])) by (endpoint)

API Request Duration

P95 API Request Latency

Calculate 95th percentile (P95) latency for API requests. This shows the response time that 95% of requests complete within. Use this to identify slow endpoints and set SLA targets.

histogram_quantile(0.95,
  sum(rate(backendai_api_request_duration_sec_bucket{service_group="$service_groups"}[5m])) by (le, endpoint)
)

Average API Request Duration

Calculate average request duration per endpoint. This provides a simple overview of typical response times.

sum(rate(backendai_api_request_duration_sec_sum{service_group="$service_groups"}[5m])) by (endpoint)
/
sum(rate(backendai_api_request_duration_sec_count{service_group="$service_groups"}[5m])) by (endpoint)

GraphQL Query Performance

GraphQL Query Rate by Operation

Monitor GraphQL query rate by operation type and field. Use this to understand which GraphQL queries are most frequently used.

sum(rate(backendai_graphql_request_count_total{service_group="$service_groups"}[5m])) by (operation_type, field_name)

Failed GraphQL Queries

Track failed GraphQL queries with error details. This helps identify problematic queries and common error patterns.

sum(rate(backendai_graphql_request_count_total{service_group="$service_groups", success="False"}[5m])) by (field_name, error_detail)

Layer Operation Performance

P95 Redis Operation Latency

Monitor Redis operation latency (P95) by layer and operation. This helps identify slow Redis operations that may cause bottlenecks. Exclude broadcast/stream operations as they have different performance characteristics.

histogram_quantile(0.95,
  sum(rate(backendai_layer_operation_duration_sec_bucket{domain="valkey", operation!~"receive_broadcast_message|read_consumer_group"}[5m])) by (le, layer, operation)
)

P95 Database Repository Latency

Monitor database repository operation latency (P95). Use this to identify slow database queries and optimize data access patterns.

histogram_quantile(0.95,
  sum(rate(backendai_layer_operation_duration_sec_bucket{domain="repository"}[5m])) by (le, layer, operation)
)

P95 Agent RPC Call Latency

Monitor Agent RPC call latency (P95). This shows how long it takes to communicate with compute agents.

histogram_quantile(0.95,
  sum(rate(backendai_layer_operation_duration_sec_bucket{domain="client", layer="agent_client"}[5m])) by (le, operation)
)

Failed Layer Operations

Track failed layer operations to identify integration issues. High error rates indicate problems with external dependencies or internal bugs.

sum(rate(backendai_layer_operation_count_total{success="False"}[5m])) by (domain, layer, operation)

Background Tasks

Currently Running Background Tasks

Monitor currently running background tasks. Gauge metric shows real-time count of active tasks. Use this to ensure background tasks are running and detect stuck tasks.

sum(backendai_bgtask_count) by (task_name)

Background Task Completion Rate

Track background task completion rate and success/failure status. This shows how frequently tasks complete and their success rate.

sum(rate(backendai_bgtask_done_count_total[5m])) by (task_name, status)

Failed Background Tasks

Monitor failed background tasks with error details. Use this to identify recurring task failures and debug issues.

sum(rate(backendai_bgtask_done_count_total{status="failure"}[5m])) by (task_name, error_detail)

Event Processing

Event Processing Rate by Type

Monitor event processing rate by event type. This shows how many events are being processed per second. Use this to understand event throughput and detect processing delays.

sum(rate(backendai_event_count_total[5m])) by (event_type)

Event Processing Failures

Track event processing failures by exception type. This helps identify problematic event handlers and common error patterns.

sum(rate(backendai_event_failure_count[5m])) by (event_type, exception)

P95 Event Processing Duration

Calculate P95 event processing duration. This shows how long it takes to process different types of events. Use this to identify slow event handlers that may cause delays.

histogram_quantile(0.95,
  sum(rate(backendai_event_processing_time_sec_bucket[5m])) by (le, event_type)
)

Action Metrics

Action Execution Rate

Monitor action execution rate grouped by entity and operation type. This shows the rate of high-level business operations like session creation. Use this to understand system activity and user behavior patterns.

sum(rate(backendai_action_count_total[5m])) by (entity_type, operation_type, status)

Failed Actions

Track failed actions with detailed error information. This helps identify which operations are failing and why.

sum(rate(backendai_action_count_total{status="failure"}[5m])) by (entity_type, operation_type, error_detail)

P95 Action Execution Duration

Calculate P95 action execution duration. This shows how long key operations take to complete. Use this to set performance expectations and identify slow operations.

histogram_quantile(0.95,
  sum(rate(backendai_action_duration_sec_bucket[5m])) by (le, entity_type, operation_type)
)

System Resources

Monitor active asyncio tasks in the event loop. High task counts may indicate resource leaks or excessive concurrency.

backendai_async_task_count

Monitor CPU usage percentage of the Manager process. Use this to detect CPU bottlenecks and capacity planning.

backendai_cpu_usage_percent

Monitor Resident Set Size (physical memory usage). This shows actual RAM usage by the Manager process.

backendai_memory_used_rss

Monitor Virtual Memory Size (total allocated memory). This includes swapped memory and memory-mapped files.

backendai_memory_used_vms

Session Sweeper

Monitor session cleanup operations by session status. This shows how many sessions are being cleaned up and success rate. Use this to ensure proper resource cleanup and identify cleanup issues.

sum(rate(backendai_sweep_session_count_total[5m])) by (status, success)

Monitor kernel cleanup operation rate. This tracks orphaned kernel cleanup operations.

sum(rate(backendai_sweep_kernel_count_total[5m])) by (success)

Logs

API request/response logs
Session scheduling decision logs
Resource allocation/release events
Authentication and authorization events
Background task execution logs
Error and exception stack traces

Communication Protocols

Agent Communication

Protocol: ZeroMQ RPC
Port: 6011 (agent RPC server)
Main Operations: Kernel lifecycle management, code execution, file operations, container statistics

Storage Proxy Communication

Protocol: HTTP
Port: 6021 (client API), 6022 (manager API)
Main Operations: VFolder management, file upload/download, file listing

etcd Communication

Protocol: gRPC (etcd v3 API)
Port: 2379
Main Operations: Configuration management, service discovery, watch notifications

Development

See README.md for development setup instructions.

Manager Architecture Documentation

Internal Architecture

Sokovan Orchestration System: Session scheduling orchestrator with 3-tier architecture
- Scheduler: Core scheduling engine for session placement and resource allocation
- Scheduling Controller: Validation and preparation logic for scheduling
- Deployment Controller: Deployment lifecycle management
Services Layer: Business logic patterns, design principles, and implementation guidelines
Repositories Layer: Data access patterns, query optimization, and transaction management
Actions Layer: Permission validation, monitoring, and request handling

Related Components

Agent Component: Kernel lifecycle management on compute nodes
Storage Proxy Component: Virtual folder and storage backend management
Webserver Component: Web UI hosting and session management
Overall Architecture: System-wide architecture and component interactions

Project details

These details have been verified by PyPI

Project links

Owner

Lablup Inc.

GitHub Statistics

Maintainers

joongi

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

This version

26.2.0

Feb 13, 2026

26.2.0rc1 pre-release

Feb 12, 2026

26.1.0

Jan 26, 2026

25.19.2

Jan 19, 2026

25.19.1

Jan 12, 2026

25.19.0

Dec 31, 2025

25.18.1

Dec 16, 2025

25.18.0

Dec 12, 2025

25.18.0rc3 pre-release

Dec 10, 2025

25.18.0rc2 pre-release

Dec 5, 2025

25.18.0rc1 pre-release

Nov 30, 2025

25.17.1

Nov 26, 2025

25.17.0

Nov 23, 2025

25.17.0rc3 pre-release

Nov 20, 2025

25.17.0rc2 pre-release

Nov 9, 2025

25.17.0rc1 pre-release

Nov 9, 2025

25.16.0

Oct 31, 2025

25.16.0rc3 pre-release

Oct 31, 2025

25.16.0rc2 pre-release

Oct 15, 2025

25.16.0rc1 pre-release

Oct 14, 2025

25.15.8

Jan 26, 2026

25.15.7

Jan 21, 2026

25.15.6

Dec 29, 2025

25.15.5

Dec 12, 2025

25.15.4

Dec 9, 2025

25.15.3

Dec 5, 2025

25.15.2

Nov 7, 2025

25.15.1

Oct 23, 2025

25.15.0

Oct 2, 2025

25.15.0rc1 pre-release

Oct 1, 2025

25.14.5

Sep 22, 2025

25.14.4

Sep 19, 2025

25.14.3

Sep 17, 2025

25.14.2

Sep 15, 2025

25.14.1

Sep 15, 2025

25.14.0

Sep 15, 2025

25.14.0rc1 pre-release

Sep 11, 2025

25.13.4

Sep 3, 2025

25.13.3

Sep 3, 2025

25.13.2

Sep 2, 2025

25.13.1

Aug 29, 2025

25.13.0

Aug 29, 2025

25.13.0rc1 pre-release

Aug 8, 2025

25.12.1

Jul 25, 2025

25.12.0

Jul 24, 2025

25.12.0rc1 pre-release

Jul 21, 2025

25.11.3

Aug 18, 2025

25.11.2

Jul 15, 2025

25.11.1

Jul 15, 2025

25.11.0

Jul 9, 2025

25.11.0rc4 pre-release

Jul 7, 2025

25.11.0rc3 pre-release

Jul 3, 2025

25.11.0rc2 pre-release

Jul 3, 2025

25.11.0rc1 pre-release

Jul 3, 2025

25.10.1

Jun 25, 2025

25.10.0

Jun 23, 2025

25.10.0rc1 pre-release

Jun 19, 2025

25.9.1

Jun 5, 2025

25.9.0

Jun 2, 2025

25.9.0rc3 pre-release

May 30, 2025

25.9.0rc2 pre-release

May 29, 2025

25.8.1

May 23, 2025

25.8.1rc1 pre-release

May 23, 2025

25.8.0

May 23, 2025

25.7.0

Apr 28, 2025

25.6.17

Dec 13, 2025

25.6.15

Oct 2, 2025

25.6.14

Sep 16, 2025

25.6.13

Sep 15, 2025

25.6.12

Jul 7, 2025

25.6.12rc1 pre-release

Jul 3, 2025

25.6.10

Jun 18, 2025

25.6.9

Jun 5, 2025

25.6.8

Jun 3, 2025

25.6.8rc1 pre-release

May 29, 2025

25.6.7

May 23, 2025

25.6.6

May 23, 2025

25.6.6rc1 pre-release

May 21, 2025

25.6.5

May 7, 2025

25.6.4

Apr 29, 2025

25.6.3

Apr 28, 2025

25.6.2 yanked

Apr 28, 2025

Reason this release was yanked:

invalid distribution name

25.6.1

Apr 21, 2025

25.6.0

Apr 17, 2025

25.6.0rc4 pre-release

Apr 15, 2025

25.6.0rc3 pre-release

Apr 11, 2025

25.6.0rc2 pre-release

Apr 4, 2025

25.6.0rc1 pre-release

Apr 2, 2025

25.5.2

Mar 31, 2025

25.5.1

Mar 27, 2025

25.5.0

Mar 25, 2025

25.5.0rc1 pre-release

Mar 25, 2025

25.4.0

Mar 12, 2025

25.4.0rc1 pre-release

Mar 7, 2025

25.3.3

Feb 27, 2025

25.3.3rc1 pre-release

Feb 26, 2025

25.3.2

Feb 26, 2025

25.3.1

Feb 21, 2025

25.3.0

Feb 19, 2025

25.2.0

Feb 7, 2025

25.1.1

Jan 21, 2025

24.12.1 yanked

Jan 4, 2025

Reason this release was yanked:

broken version

24.9.14

Sep 16, 2025

24.9.13

Sep 15, 2025

24.9.7 yanked

Feb 18, 2025

Reason this release was yanked:

corrupted database revision

24.9.6

Feb 7, 2025

24.9.5

Jan 4, 2025

24.9.4

Dec 11, 2024

24.9.4rc1 pre-release

Dec 11, 2024

24.9.3

Dec 9, 2024

24.9.3rc2 pre-release

Dec 9, 2024

24.9.3rc1 pre-release

Dec 9, 2024

24.9.2

Nov 29, 2024

24.9.2rc2 pre-release

Nov 29, 2024

24.9.2rc1 pre-release

Nov 29, 2024

24.9.1

Nov 25, 2024

24.9.1rc2 pre-release

Oct 28, 2024

24.9.1rc1 pre-release

Oct 25, 2024

24.9.0

Oct 21, 2024

24.9.0rc1 pre-release

Oct 21, 2024

24.3.11

Oct 21, 2024

24.3.10

Sep 27, 2024

24.3.10rc1 pre-release

Sep 27, 2024

24.3.10b3 pre-release

Sep 5, 2024

24.3.10b2 pre-release

Sep 4, 2024

24.3.10b1 pre-release

Sep 4, 2024

24.3.9

Aug 23, 2024

24.3.9rc1 pre-release

Aug 23, 2024

24.3.9b1 pre-release

Aug 21, 2024

24.3.8

Aug 13, 2024

24.3.8rc2 pre-release

Aug 7, 2024

24.3.8rc1 pre-release

Aug 5, 2024

24.3.7

Aug 4, 2024

24.3.7rc2 pre-release

Aug 1, 2024

24.3.7rc1 pre-release

Jul 31, 2024

24.3.7b4 pre-release

Jul 31, 2024

24.3.7b3 pre-release

Jul 17, 2024

24.3.7b2 pre-release

Jul 16, 2024

24.3.7b1 pre-release

Jul 15, 2024

24.3.7a2 pre-release

Jul 8, 2024

24.3.7a1 pre-release

Jul 5, 2024

24.3.6

Jun 21, 2024

24.3.5

Jun 19, 2024

24.3.5rc1 pre-release

Jun 19, 2024

24.3.5b1 pre-release

Jun 16, 2024

24.3.4

Jun 4, 2024

24.3.4rc1 pre-release

Jun 3, 2024

24.3.4b2 pre-release

Jun 2, 2024

24.3.4b1 pre-release

May 31, 2024

24.3.3

Apr 30, 2024

24.3.3rc3 pre-release

Apr 30, 2024

24.3.3rc2 pre-release

Apr 30, 2024

24.3.3rc1 pre-release

Apr 29, 2024

24.3.2

Apr 17, 2024

24.3.2rc2 pre-release

Apr 17, 2024

24.3.2rc1 pre-release

Apr 17, 2024

24.3.1rc1 pre-release

Apr 16, 2024

24.3.0

Apr 5, 2024

24.3.0rc4 pre-release

Apr 5, 2024

24.3.0rc3 pre-release

Apr 5, 2024

24.3.0rc2 pre-release

Mar 31, 2024

24.3.0rc1 pre-release

Mar 31, 2024

24.3.0b1 pre-release

Mar 14, 2024

24.3.0a2 pre-release

Feb 14, 2024

24.3.0.dev3 pre-release

Nov 8, 2023

24.3.0.dev1 pre-release

Nov 8, 2023

24.3.0.dev0 pre-release

Nov 8, 2023

23.9.10

Mar 27, 2024

23.9.10rc6 pre-release

Mar 14, 2024

23.9.10rc5 pre-release

Mar 13, 2024

23.9.10rc4 pre-release

Mar 6, 2024

23.9.10rc3 pre-release

Mar 5, 2024

23.9.10rc2 pre-release yanked

Mar 5, 2024

Reason this release was yanked:

Fatal error in Backend.AI Manager

23.9.10rc1 pre-release

Mar 4, 2024

23.9.9rc1 pre-release

Feb 4, 2024

23.9.8

Jan 20, 2024

23.9.8rc4 pre-release

Jan 18, 2024

23.9.8rc3 pre-release

Jan 12, 2024

23.9.8rc2 pre-release

Dec 15, 2023

23.9.8rc1 pre-release

Dec 14, 2023

23.9.6

Dec 20, 2023

23.9.5

Nov 2, 2023

23.9.4 yanked

Nov 1, 2023

Reason this release was yanked:

Shipped with invalid static webui bundle

23.9.3

Oct 26, 2023

23.9.2

Oct 24, 2023

23.9.1

Oct 10, 2023

23.9.0

Sep 28, 2023

23.9.0b3 pre-release

Sep 22, 2023

23.9.0b2 pre-release

Sep 20, 2023

23.9.0b1 pre-release yanked

Sep 19, 2023

23.9.0a4 pre-release

Sep 8, 2023

23.9.0a3 pre-release

Sep 6, 2023

23.9.0a2 pre-release yanked

Sep 6, 2023

23.9.0a1 pre-release

Sep 5, 2023

23.3.12

Sep 5, 2023

23.3.11

Aug 19, 2023

23.3.10

Jul 31, 2023

23.3.9

Jul 17, 2023

23.3.8

Jul 6, 2023

23.3.7

Jul 3, 2023

23.3.6

Jun 14, 2023

23.3.5

Jun 13, 2023

23.3.4

May 29, 2023

23.3.3

May 25, 2023

23.3.2

May 5, 2023

23.3.1 yanked

May 4, 2023

Reason this release was yanked:

lablup/backend.ai#1261

23.3.0

Mar 29, 2023

23.3.0a4 pre-release

Mar 16, 2023

23.3.0a3 pre-release

Mar 15, 2023

23.3.0a2 pre-release

Mar 15, 2023

23.3.0a1 pre-release

Mar 2, 2023

23.3.0.dev0 pre-release

Feb 6, 2023

22.9.23

Aug 8, 2023

22.9.22

May 29, 2023

22.9.21

Apr 3, 2023

22.9.20

Mar 26, 2023

22.9.19

Mar 22, 2023

22.9.18

Mar 19, 2023

22.9.17

Mar 9, 2023

22.9.16

Mar 5, 2023

22.9.15

Mar 3, 2023

22.9.14 yanked

Feb 28, 2023

Reason this release was yanked:

Backport issue with kernel-binary and user container entrypoint

22.9.13 yanked

Feb 27, 2023

Reason this release was yanked:

Backport issue with kernel-binary and user container entrypoint

22.9.12

Feb 20, 2023

22.9.11

Feb 12, 2023

22.9.10

Feb 4, 2023

22.9.9

Jan 25, 2023

22.9.8

Jan 10, 2023

22.9.7

Jan 9, 2023

22.9.6

Dec 9, 2022

22.9.5

Nov 28, 2022

22.9.4

Oct 26, 2022

22.9.3

Oct 25, 2022

22.9.2

Oct 17, 2022

22.9.1

Oct 7, 2022

22.9.0

Sep 28, 2022

22.9.0b6 pre-release

Sep 2, 2022

22.9.0b5 pre-release

Aug 30, 2022

22.9.0b4 pre-release

Aug 22, 2022

22.9.0b3 pre-release

Aug 18, 2022

22.9.0b2 pre-release yanked

Aug 18, 2022

Reason this release was yanked:

Missing modules due to malformed packaging

22.9.0b1 pre-release yanked

Aug 18, 2022

Reason this release was yanked:

Missing modules due to malformed packaging

22.6.0b4 pre-release

Jul 28, 2022

22.6.0b3 pre-release

Jul 27, 2022

22.6.0b2 pre-release

Jul 18, 2022

22.6.0b1 pre-release

Jun 26, 2022

22.6.0.dev4 pre-release

Jun 9, 2022

22.6.0.dev2 pre-release

Jun 3, 2022

22.6.0.dev1 pre-release

Jun 3, 2022

22.6.0.dev0 pre-release

May 28, 2022

22.3.14

Aug 30, 2022

22.3.13

Aug 18, 2022

22.3.12 yanked

Aug 18, 2022

Reason this release was yanked:

Missing modules due to malformed packaging

22.3.11 yanked

Aug 18, 2022

Reason this release was yanked:

Missing modules due to malformed packaging

22.3.10

Jul 18, 2022

22.3.9

Jul 10, 2022

22.3.8

Jun 26, 2022

22.3.7

Jun 16, 2022

22.3.6

Jun 10, 2022

22.3.5

Jun 8, 2022

22.3.4

Jun 8, 2022

22.3.4rc1 pre-release

Jun 8, 2022

22.3.3

May 24, 2022

22.3.2

May 17, 2022

22.3.1

May 3, 2022

22.3.0

Apr 25, 2022

22.3.0b2 pre-release

Apr 18, 2022

22.3.0b1 pre-release

Apr 12, 2022

22.3.0a2 pre-release

Mar 29, 2022

22.3.0a1 pre-release

Mar 14, 2022

21.9.9

Mar 29, 2022

21.9.8

Mar 7, 2022

21.9.7

Feb 14, 2022

21.9.6

Jan 26, 2022

21.9.5

Jan 13, 2022

21.9.4

Jan 10, 2022

21.9.3

Jan 10, 2022

21.9.2

Dec 15, 2021

21.9.1

Nov 11, 2021

21.9.0

Nov 8, 2021

21.9.0a2 pre-release

Sep 28, 2021

21.9.0a1 pre-release

Aug 25, 2021

21.3.32

Mar 29, 2022

21.3.31

Mar 7, 2022

21.3.30

Feb 14, 2022

21.3.29

Jan 26, 2022

21.3.28

Jan 13, 2022

21.3.27

Jan 10, 2022

21.3.26

Jan 10, 2022

21.3.25

Dec 15, 2021

21.3.24

Nov 8, 2021

21.3.23

Oct 21, 2021

21.3.22

Oct 6, 2021

21.3.21

Oct 5, 2021

21.3.20

Sep 2, 2021

21.3.19

Aug 23, 2021

21.3.18

Jul 22, 2021

21.3.17

Jul 20, 2021

21.3.16 yanked

Jul 20, 2021

Reason this release was yanked:

Critical bug present when processing GQL batched queries (Group.by_user, ScalingGroup.by_group)

21.3.15 yanked

Jul 19, 2021

Reason this release was yanked:

Broken on AWS RDS deployment due to new PostgreSQL connection options

21.3.14 yanked

Jul 19, 2021

Reason this release was yanked:

Critical bug present when processing GQL batched queries (Group.by_user, ScalingGroup.by_group)

21.3.13 yanked

Jul 18, 2021

Reason this release was yanked:

Critical bugs present due to mismatch of GQL root queries' field names and their resolvers

21.3.12

Jul 13, 2021

21.3.11

Jul 13, 2021

21.3.10

Jun 28, 2021

21.3.9

Jun 18, 2021

21.3.8

Jun 14, 2021

21.3.7

Jun 13, 2021

21.3.6

Jun 6, 2021

21.3.5

May 17, 2021

21.3.4

May 13, 2021

21.3.3

Apr 13, 2021

21.3.2

Apr 2, 2021

21.3.1

Mar 31, 2021

21.3.0

Mar 29, 2021

20.9.29

Sep 2, 2021

20.9.28

Aug 23, 2021

20.9.27

Jul 22, 2021

20.9.26

Jul 20, 2021

20.9.25 yanked

Jul 20, 2021

Reason this release was yanked:

Critical bug present when processing GQL batched queries (Group.by_user, ScalingGroup.by_group)

20.9.24 yanked

Jul 19, 2021

Reason this release was yanked:

Broken on AWS RDS deployment due to new PostgreSQL connection options

20.9.23 yanked

Jul 19, 2021

Reason this release was yanked:

Critical bug present when processing GQL batched queries (Group.by_user, ScalingGroup.by_group)

20.9.22 yanked

Jul 18, 2021

Reason this release was yanked:

Critical bugs present due to mismatch of GQL root queries' field names and their resolvers

20.9.21

Jul 13, 2021

20.9.20

Jul 13, 2021

20.9.19

Jun 18, 2021

20.9.18

May 17, 2021

20.9.17

May 13, 2021

20.9.16

Apr 13, 2021

20.9.15

Apr 7, 2021

20.9.14

Mar 31, 2021

20.9.13

Mar 29, 2021

20.9.12

Mar 19, 2021

20.9.11

Mar 18, 2021

20.9.10

Mar 8, 2021

20.9.9

Mar 5, 2021

20.9.8

Feb 22, 2021

20.9.7

Feb 16, 2021

20.9.6

Feb 4, 2021

20.9.5

Jan 19, 2021

20.9.4

Jan 8, 2021

20.9.3

Jan 4, 2021

20.9.2

Dec 30, 2020

20.9.1

Dec 28, 2020

20.9.0

Dec 26, 2020

20.9.0rc3 pre-release

Dec 26, 2020

20.9.0rc2 pre-release

Dec 24, 2020

20.9.0rc1 pre-release

Dec 23, 2020

20.9.0b6 pre-release

Dec 21, 2020

20.9.0b5 pre-release

Dec 20, 2020

20.9.0b4 pre-release

Dec 19, 2020

20.9.0b3 pre-release

Dec 18, 2020

20.9.0b2 pre-release

Dec 7, 2020

20.9.0b1 pre-release

Dec 2, 2020

20.9.0a4 pre-release

Nov 16, 2020

20.9.0a3 pre-release

Nov 2, 2020

20.9.0a2 pre-release

Oct 30, 2020

20.9.0a1 pre-release

Oct 5, 2020

20.3.19

Mar 4, 2021

20.3.18

Feb 1, 2021

20.3.17

Jan 19, 2021

20.3.16

Jan 15, 2021

20.3.15

Jan 6, 2021

20.3.14

Dec 20, 2020

20.3.13

Dec 17, 2020

20.3.12

Dec 17, 2020

20.3.11

Dec 7, 2020

20.3.10

Dec 2, 2020

20.3.9

Nov 23, 2020

20.3.8

Nov 23, 2020

20.3.7

Nov 23, 2020

20.3.6

Oct 6, 2020

20.3.5

Sep 9, 2020

20.3.4

Sep 4, 2020

20.3.3

Aug 27, 2020

20.3.2

Aug 10, 2020

20.3.1

Jul 28, 2020

20.3.0rc1 pre-release

Jul 22, 2020

20.3.0b2 pre-release

Jul 2, 2020

20.3.0b1 pre-release

May 12, 2020

19.12.0a2 pre-release

Dec 31, 2019

19.12.0a1 pre-release

Dec 26, 2019

19.9.27

Aug 5, 2020

19.9.26

Jun 15, 2020

19.9.25

May 20, 2020

19.9.24

May 15, 2020

19.9.23

May 15, 2020

19.9.22

Apr 30, 2020

19.9.21

Apr 29, 2020

19.9.20

Apr 27, 2020

19.9.19

Mar 16, 2020

19.9.18

Mar 8, 2020

19.9.17

Feb 27, 2020

19.9.16

Feb 26, 2020

19.9.15

Feb 18, 2020

19.9.14

Feb 10, 2020

19.9.13

Feb 10, 2020

19.9.12

Jan 17, 2020

19.9.11

Jan 16, 2020

19.9.10

Jan 9, 2020

19.9.9

Dec 18, 2019

19.9.8

Dec 15, 2019

19.9.7

Dec 3, 2019

19.9.6

Nov 11, 2019

19.9.5

Nov 3, 2019

19.9.4

Oct 15, 2019

19.9.3

Oct 14, 2019

19.9.2

Oct 10, 2019

19.9.1

Oct 10, 2019

19.9.0

Oct 7, 2019

19.9.0rc4 pre-release

Oct 4, 2019

19.9.0rc3 pre-release

Sep 24, 2019

19.9.0rc2 pre-release

Sep 24, 2019

19.9.0rc1 pre-release

Sep 22, 2019

19.9.0b14 pre-release

Sep 16, 2019

19.9.0b13 pre-release

Sep 10, 2019

19.9.0b12 pre-release

Sep 2, 2019

19.9.0b11 pre-release

Aug 29, 2019

19.9.0b10 pre-release

Aug 26, 2019

19.9.0b9 pre-release

Aug 21, 2019

19.9.0b8 pre-release

Aug 20, 2019

19.9.0b7 pre-release

Aug 14, 2019

19.9.0b6 pre-release

Aug 14, 2019

19.9.0b5 pre-release

Aug 6, 2019

19.6.0b4 pre-release

Jul 24, 2019

19.6.0b3 pre-release

Jul 17, 2019

19.6.0b2 pre-release

Jul 14, 2019

19.6.0b1 pre-release

Jul 14, 2019

19.6.0a1 pre-release

Jun 2, 2019

19.3.5

Aug 19, 2019

19.3.4

Aug 14, 2019

19.3.3

Jul 17, 2019

19.3.2

Jul 12, 2019

19.3.1

Apr 21, 2019

19.3.0

Apr 9, 2019

19.3.0rc2 pre-release

Mar 25, 2019

19.3.0rc1 pre-release

Feb 24, 2019

19.3.0b9 pre-release

Feb 14, 2019

19.3.0b8 pre-release

Feb 7, 2019

19.3.0b7 pre-release

Feb 3, 2019

19.3.0b6 pre-release

Jan 31, 2019

19.3.0b5 pre-release

Jan 31, 2019

19.3.0b4 pre-release

Jan 30, 2019

19.3.0b3 pre-release

Jan 30, 2019

19.3.0b2 pre-release

Jan 30, 2019

19.3.0b1 pre-release

Jan 29, 2019

19.3.0a2 pre-release

Jan 21, 2019

19.3.0a1 pre-release

Jan 18, 2019

18.12

Jan 5, 2019

18.12.0a4 pre-release

Dec 25, 2018

18.12.0a3 pre-release

Dec 21, 2018

18.12.0a2 pre-release

Dec 21, 2018

18.12.0a1 pre-release

Dec 14, 2018

1.4.5

Nov 22, 2018

1.4.3

Nov 5, 2018

1.4.2

Nov 1, 2018

1.4.1

Oct 17, 2018

1.4.0

Sep 30, 2018

1.3.12

Oct 16, 2018

1.3.11

Jun 6, 2018

1.3.10

Apr 30, 2018

1.3.9

Apr 12, 2018

1.3.8

Apr 6, 2018

1.3.7

Apr 4, 2018

1.3.6

Apr 3, 2018

1.3.5

Mar 23, 2018

1.3.3

Mar 19, 2018

1.3.2

Mar 15, 2018

1.3.1

Mar 14, 2018

1.3.0

Mar 8, 2018

1.2.2

Feb 13, 2018

1.2.1

Jan 30, 2018

1.2.0

Jan 29, 2018

1.1.0

Jan 6, 2018

1.0.4

Dec 19, 2017

1.0.3

Nov 29, 2017

1.0.2

Nov 13, 2017

1.0.1

Nov 9, 2017

1.0.0

Oct 16, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

backend_ai_manager-26.2.0.tar.gz (2.8 MB view details)

Uploaded Feb 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

backend_ai_manager-26.2.0-py3-none-any.whl (3.7 MB view details)

Uploaded Feb 13, 2026 Python 3

File details

Details for the file backend_ai_manager-26.2.0.tar.gz.

File metadata

Download URL: backend_ai_manager-26.2.0.tar.gz
Upload date: Feb 13, 2026
Size: 2.8 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for backend_ai_manager-26.2.0.tar.gz
Algorithm	Hash digest
SHA256	`5ae9693fe8f02f9022ca058f7ea2431de6db0df81947142b9e6beb3e1e5495a1`
MD5	`1f21c50f5e89cbdb1e9b9156bbb16f37`
BLAKE2b-256	`7da4c7f7042702d49f037cac481794affb96e6b656376beedd16b7f240d14228`

See more details on using hashes here.

File details

Details for the file backend_ai_manager-26.2.0-py3-none-any.whl.

File metadata

Download URL: backend_ai_manager-26.2.0-py3-none-any.whl
Upload date: Feb 13, 2026
Size: 3.7 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for backend_ai_manager-26.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d93ec8cdb5c0b7b0c737be9c23e4b3add16bc80e97f370f09332995c3852188a`
MD5	`e1a9c4d52c8ff8bd2cc008478d0d03df`
BLAKE2b-256	`43256ac1112639460fbbb137fbe6a9ae5b18457a38f148134a196f43c2649382`

See more details on using hashes here.

backend.ai-manager 26.2.0

Navigation

Verified details

Project links

Owner

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Backend.AI Manager

Purpose

Key Responsibilities

1. API Gateway

2. Session Scheduling

3. Resource Management

4. User and Organization Management

5. Virtual Folder Management

6. Image Registry Management

Entry Points

1. REST API

2. GraphQL API

3. Event Dispatcher

4. Background Task Handler

Entry Point Interactions

Architecture

Directory Structure

Core Concepts

Sessions

Agents

Scaling Groups

Virtual Folders (VFolders)

Infrastructure Dependencies

Required Infrastructure

PostgreSQL (Persistent Data)

Redis (Caching and Real-time Data)

etcd (Global Configuration)

Optional Infrastructure (Observability)

Prometheus (Metrics Collection)

Loki (Log Aggregation)

Grafana (Visualization)

Configuration

Key Configuration Items

Halfstack Configuration

Starting Development Environment

API Access

Metrics and Monitoring

Prometheus Metrics

Label Conventions

API Metrics

GraphQL Metrics

Event Metrics

Background Task Metrics

Action Metrics

Layer Operation Metrics

System Metrics

Sweeper Metrics

Event Propagator Metrics

Stage Metrics

Prometheus Query Examples

API Request Rate

API Request Duration

GraphQL Query Performance

Layer Operation Performance

Background Tasks

Event Processing

Action Metrics

System Resources

Session Sweeper

Logs

Communication Protocols

Agent Communication

Storage Proxy Communication

etcd Communication

Development

Manager Architecture Documentation

Internal Architecture

Related Components