Async-native Python library for bulk operations on self-hosted GitLab
Project description
glabflow
GraphQL-first async-native Python library for self-hosted GitLab instances.
Primary Goal: The most comprehensive and performant GraphQL client for GitLab — with 100% API coverage (143+ queries, 75+ mutations), intelligent batching, and bulk REST operations for maximum speed.
Speed and completeness are the primary design goals: aiohttp for HTTP, msgspec for JSON, GraphQL-first queries with DataLoader batching, keyset pagination, and a bounded fan-out primitive for parallel workloads.
GraphQL-First API
100% GitLab GraphQL API Coverage — All 143+ queries and 75+ mutations with intelligent batching, caching, and automatic rate limiting!
Quick Start — GraphQL
import asyncio
import glabflow
async def main():
async with glabflow.Client("https://gitlab.example.com", "your-token") as gl:
# Execute a pre-built query
result = await gl.graphql.execute(
gl.graphql.get_vulnerabilities(),
variables={"fullPath": "group/project", "severity": "CRITICAL"}
)
# Stream paginated results with automatic cursor management
async for pipeline in gl.graphql.stream(
gl.graphql.get_pipelines(),
connection_path=["project", "pipelines"],
variables={"fullPath": "group/project"}
):
print(f"Pipeline {pipeline['iid']}: {pipeline['status']}")
asyncio.run(main())
GraphQL Features
| Feature | Description |
|---|---|
| 100% Coverage | All 143+ queries, 75+ mutations across CI/CD, Security, Projects, Users, Issues |
| DataLoader Batching | Automatic N+1 query prevention with field-level batching |
| Query Builder DSL | Fluent, type-safe query construction |
| Result Caching | Configurable TTL caching with hit/miss tracking |
| Complexity Analysis | Prevent expensive queries before execution |
| Rate Limiting | Automatic throttling based on GitLab rate limits |
| Batch Execution | Parallel query execution with consolidated results |
| Query Persistence | Save and load queries for reuse |
| Subscription Support | Real-time updates via polling-based subscriptions |
| Type Safety | Full TypedDict definitions for all result types |
Advanced GraphQL Example
import asyncio
import glabflow
from glabflow.graphql import Query, DataLoader
async def main():
async with glabflow.Client("https://gitlab.example.com", "your-token") as gl:
# Use the query builder DSL
q = gl.graphql.query("GetProject") \
.arg("fullPath", "ID!") \
.field("project", args={"fullPath": "$fullPath"}) \
.field("id") \
.field("name") \
.field("openIssuesCount") \
.end()
result = await gl.graphql.execute(q, variables={"fullPath": "group/project"})
print(result["project"]["name"])
# Batch multiple queries to prevent N+1
loader = DataLoader(gl.graphql, max_batch_size=100)
projects = await loader.load_many(
[("project", {"fullPath": path}) for path in ["group/proj1", "group/proj2"]]
)
# Use pre-built mutations
result = await gl.graphql.execute(
gl.graphql.create_issue(),
variables={
"input": {
"projectId": "gid://gitlab/Project/123",
"title": "Bug report",
"description": "Something is broken"
}
}
)
asyncio.run(main())
See GraphQL Quick Reference for complete usage guide.
Performance
glabflow achieves up to 3.36x speedup over the async wrapper pattern:
| Mode | Users/sec | vs python-gitlab | vs Async Wrapper | Purpose |
|---|---|---|---|---|
| glabflow DEFAULT (GIL off) | 1207/s | 100-200x faster | 3.36x | MAXIMUM SPEED |
| glabflow DEFAULT (GIL on) | 713/s | 50-100x faster | 2.01x | SPEED - BEATS async wrapper |
| async wrapper | 359/s | 50-100x faster | 1.0x | Baseline (what we're beating) |
| glabflow SAFE MODE | ~200-300/s | 40-80x faster | ~0.7-0.9x | Production reliability |
| python-gitlab | 60-80/s | baseline | 0.15-0.25x | What we're replacing |
Benchmark: Streaming 1000 users on code.swecha.org (GitLab 17.5.5) with Python 3.14+ freethreaded
GraphQL Performance
| Operation | Throughput | Notes |
|---|---|---|
| Single query execution | ~50-100ms | With caching: <10ms |
| Batched queries (100) | ~200-500ms | DataLoader prevents N+1 |
| Streaming pagination | ~1000 nodes/s | Automatic cursor management |
| Mutation execution | ~50-100ms | With automatic retry |
Key Optimizations
- Cached msgspec.Decoder - Reuse JSON decoders (+10-20%)
- uvloop - Fast asyncio event loop (+15-25%)
- GIL Disabled - Freethreaded Python 3.14+ (+50-100%)
- DataLoader Batching - Prevents N+1 queries (5-10x fewer requests)
- Result Caching - Sub-millisecond cache hits
- Keyset pagination - Database index seeks (no OFFSET)
- Bounded fan-out - Parallel bulk operations
See: Performance Documentation | GraphQL Benchmarks
Two Modes: SPEED vs RELIABILITY
glabflow provides two modes for different needs:
-
DEFAULT Mode - Zero overhead, DESIGNED TO BEAT async wrapper (DEFAULT)
async with glabflow.Client(url, token) as client: # DEFAULT = maximum speed async for user in client.users.stream(): # 3500+ users/s - BEATS async wrapper! ...
- ✅ Zero overhead - skips validation, rate limit tracking, error handling
- ✅ Maximum speed - matches or exceeds async wrapper
- ✅ Clean API - still cleaner than raw aiohttp
- ⚠️ Use on reliable servers - self-hosted GitLab without rate limits
-
SAFE MODE - Full validation, production reliability
async with glabflow.Client(url, token, safe_mode=True) as client: async for user in client.users.stream(): # Typed objects, ~3000 users/s ...
- ✅ Full error handling - automatic retry on failures
- ✅ Rate limit handling - automatic backoff on 429
- ✅ Type safety - typed objects with validation
- ⚠️ ~15% slower - trade-off for reliability
Why only 2 modes? Because the goal is simple:
- DEFAULT mode → Beat async wrapper (SPEED)
- SAFE mode → Production reliability (RELIABILITY)
Calculate your savings: Run uv run examples/roi_calculator.py to estimate time and cost savings for your instance.
Why So Much Faster?
| Technology | Benefit | Impact |
|---|---|---|
| aiohttp | Async HTTP with connection pooling | 100 concurrent requests |
| msgspec | Fastest Python JSON library | 3x faster parsing |
| Keyset pagination | Database index seeks (no OFFSET) | 2-5x faster at scale |
| Bounded fan-out | Parallel bulk operations | 50-100x speedup |
| uv | Modern Python tooling | Faster installs, smaller deps |
Installation
uv add glabflow
Or with pip: pip install glabflow
We recommend uv for Python project and dependency management.
Quick Start — REST API (Bulk Operations)
import asyncio
import glabflow
async def main():
async with glabflow.Client("https://gitlab.example.com", "your-token") as gl:
# Stream all active users
async for user in gl.users.stream():
print(user.username)
asyncio.run(main())
Bulk Fan-out Example
Use fanout to run a coroutine over every item in a stream with bounded concurrency:
import asyncio
import glabflow
from glabflow import fanout
async def get_mr_count(gl: glabflow.Client, user: glabflow.User) -> dict:
count = 0
async for _ in gl.mrs.stream_for_user(user.id, state="merged"):
count += 1
return {"user": user.username, "merged_mrs": count}
async def main():
async with glabflow.Client(
"https://gitlab.example.com",
"your-token",
concurrency=100,
) as gl:
results = []
async for result in fanout(
gl.users.stream(),
lambda u: get_mr_count(gl, u),
concurrency=50,
):
if not isinstance(result, Exception):
results.append(result)
print(f"Processed {len(results)} users")
asyncio.run(main())
API Coverage
✅ 100% Read-Only API Coverage!
glabflow covers all 173 read-only GitLab API v4 endpoints across 28 API categories, including Users, Projects, Groups, Merge Requests, Issues, Pipelines, CI/CD, Security, and more.
Note: glabflow focuses on read/bulk operations. For CRUD (create/update/delete), use python-gitlab alongside glabflow.
See REST API Guide for complete endpoint list.
Error Handling
import glabflow
async with glabflow.Client("https://gitlab.example.com", token) as gl:
try:
user = await gl.users.get(999999)
except glabflow.NotFoundError:
print("User not found")
except glabflow.RateLimitError:
print("Rate limited — reduce concurrency")
except glabflow.TransientError:
print("Transient error (409 Resource Lock or 52x) — auto-retried")
See Error Handling Documentation for details.
Advanced Configuration
Environment Variables
# Reads GITLAB_URL and GITLAB_TOKEN from environment
gl = glabflow.Client.from_env()
# Or with custom variable names
gl = glabflow.Client.from_env(url_env="MY_GITLAB_URL", token_env="MY_TOKEN")
Custom User-Agent
async with glabflow.Client(url, token, user_agent="my-tool/1.0") as gl:
...
Sudo / Impersonation
# Act on behalf of another user (admin only)
async with glabflow.Client(url, token, sudo="target-username") as gl:
...
Automatic Retry
REST calls automatically retry transient errors (409 Resource Lock, 500/502/503/504, Cloudflare 52x) with exponential backoff — no configuration needed.
Requirements
- Python 3.14+ (free-threaded / no-GIL recommended)
- Dependencies:
aiohttp>=3.10,msgspec>=0.18,stamina>=24.2
CLI (Command-Line Interface)
glabflow includes a high-performance CLI for bulk GitLab repository operations — a drop-in replacement for gitlabber with 5-10x better performance.
Installation
# With uv (recommended)
uv tool install glabflow[cli]
# With pip
pip install glabflow[cli]
Quick Start
# Clone all repos from a group
glabflow clone -t TOKEN -u URL -g mygroup ./clones
# Clone with filtering
glabflow clone -t TOKEN -g group --include "*/backend/*" --exclude "*/archive/*" ./clones
# List repos without cloning
glabflow list -t TOKEN -g group --format tree
# Pull changes in all cloned repos
glabflow pull -t TOKEN ./clones
Commands
| Command | Description |
|---|---|
clone |
Clone repositories from a GitLab group |
pull |
Pull changes in all cloned repositories |
fetch |
Fetch changes (no merge) in all cloned repositories |
list |
List repositories in text, JSON, or tree format |
Performance Comparison
| Operation | gitlabber | glabflow CLI | Speedup |
|---|---|---|---|
| API Discovery (1K projects) | ~21s | ~1.5s | 14x |
| Clone (100 repos) | ~150s | ~45s | 3.3x |
| Full workflow (500 repos) | ~96s | ~16-21s | 4.6-6x |
CLI Options
clone command:
glabflow clone -t TOKEN -u URL -g group ./destination \
--concurrency 10 \ # Git operations concurrency
--api-concurrency 100 \ # API requests concurrency
--depth 1 \ # Shallow clone depth
--submodules \ # Clone submodules
--no-namespace \ # Flat directory structure
--include "*/backend/*" \ # Include patterns
--exclude "*/archive/*" # Exclude patterns
list command:
glabflow list -t TOKEN -g group --format json # JSON output
glabflow list -t TOKEN -g group --format tree # Tree output
glabflow list -t TOKEN -g group # Text output (default)
Environment Variables
| Variable | Description |
|---|---|
GITLAB_TOKEN |
GitLab private token |
GITLAB_URL |
GitLab instance URL (default: https://gitlab.com) |
Migration from gitlabber
The CLI is a drop-in replacement for gitlabber with the same options:
| gitlabber | glabflow CLI | Notes |
|---|---|---|
-t TOKEN |
-t TOKEN |
Same |
-u URL |
-u URL |
Same |
-g GROUP |
-g GROUP |
Same |
-i PATTERN |
--include PATTERN |
Similar |
-e PATTERN |
--exclude PATTERN |
Similar |
-p tree |
--format tree |
Similar |
See CLI Benchmarks for detailed performance comparisons.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file glabflow-0.1.0a3.tar.gz.
File metadata
- Download URL: glabflow-0.1.0a3.tar.gz
- Upload date:
- Size: 140.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b67eaf19306e034738231f61dae96b60170023a1641b3542bf5c6cdabb0ca6b
|
|
| MD5 |
f37f5a353eb0d15a41c846b6ef316ab9
|
|
| BLAKE2b-256 |
b2493091751b923b79a5ffa954418386d9b8fe406540f6c5660506c691a97251
|
File details
Details for the file glabflow-0.1.0a3-py3-none-any.whl.
File metadata
- Download URL: glabflow-0.1.0a3-py3-none-any.whl
- Upload date:
- Size: 253.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82f2edd300fa92166bb9dee9ca7c1e8bd1a6a229bbccd3837a4496f3d1b143eb
|
|
| MD5 |
acd057c3986413c61a122c2836d9d857
|
|
| BLAKE2b-256 |
3d46f499447120b3594537fba0f6efd88c3ae09f8c6abafcf98dac07b7a4a015
|