Async-native Python library for bulk operations on self-hosted GitLab
Project description
glabflow
The canonical GitLab client for AI agent toolkits and high-performance automation.
Built for two use cases:
- AI Agents — First-party tool definitions for Claude, LangChain, smolagents
- High-Volume Automation — GraphQL-first access, bulk operations at scale
Primary Design Goals:
- Agent-Native — Async generators (streaming), NDJSON output, TypedDicts
- 100% GitLab API Coverage — 205 REST endpoints (CRUD), 143+ GraphQL queries, 75+ mutations
- Maximum Speed —
aiohttpfor HTTP,msgspecfor JSON, keyset pagination
AI Agent SDK — Agents, Meet GitLab
from glabflow.tools import get_tools
# Get Claude Tool Use compatible tools
tools = await get_tools(client, framework="claude")
# Or LangChain, smolagents
tools = await get_tools(client, framework="langchain")
tools = await get_tools(client, framework="smolagents")
Why glabflow for Agents?
| Feature | Agent Benefit |
|---|---|
| Async generators | Stream results incrementally |
| NDJSON output | Parse one line at a time |
| TypedDicts | Schema-grounded, not stringly-typed |
| Tool definitions | Drop-in for agent frameworks |
High-Performance Automation
100% GitLab GraphQL API Coverage — All 143+ queries and 75+ mutations with intelligent batching, caching, and automatic rate limiting!
Quick Start — GraphQL
import asyncio
import glabflow
async def main():
async with glabflow.Client("https://gitlab.example.com", "your-token") as gl:
# Execute a pre-built query
result = await gl.graphql.execute(
gl.graphql.get_vulnerabilities(),
variables={"fullPath": "group/project", "severity": "CRITICAL"}
)
# Stream paginated results with automatic cursor management
async for pipeline in gl.graphql.stream(
gl.graphql.get_pipelines(),
connection_path=["project", "pipelines"],
variables={"fullPath": "group/project"}
):
print(f"Pipeline {pipeline['iid']}: {pipeline['status']}")
asyncio.run(main())
GraphQL Features
| Feature | Description |
|---|---|
| 100% Coverage | All 143+ queries, 75+ mutations across CI/CD, Security, Projects, Users, Issues |
| DataLoader Batching | Automatic N+1 query prevention with field-level batching |
| Query Builder DSL | Fluent, type-safe query construction |
| Result Caching | Configurable TTL caching with hit/miss tracking |
| Complexity Analysis | Prevent expensive queries before execution |
| Rate Limiting | Automatic throttling based on GitLab rate limits |
| Batch Execution | Parallel query execution with consolidated results |
| Query Persistence | Save and load queries for reuse |
| Subscription Support | Real-time updates via polling-based subscriptions |
| Type Safety | Full TypedDict definitions for all result types |
Advanced GraphQL Example
import asyncio
import glabflow
from glabflow.graphql import Query, DataLoader
async def main():
async with glabflow.Client("https://gitlab.example.com", "your-token") as gl:
# Use the query builder DSL
q = gl.graphql.query("GetProject") \
.arg("fullPath", "ID!") \
.field("project", args={"fullPath": "$fullPath"}) \
.field("id") \
.field("name") \
.field("openIssuesCount") \
.end()
result = await gl.graphql.execute(q, variables={"fullPath": "group/project"})
print(result["project"]["name"])
# Batch multiple queries to prevent N+1
loader = DataLoader(gl.graphql, max_batch_size=100)
projects = await loader.load_many(
[("project", {"fullPath": path}) for path in ["group/proj1", "group/proj2"]]
)
# Use pre-built mutations
result = await gl.graphql.execute(
gl.graphql.create_issue(),
variables={
"input": {
"projectId": "gid://gitlab/Project/123",
"title": "Bug report",
"description": "Something is broken"
}
}
)
asyncio.run(main())
See GraphQL Quick Reference for complete usage guide.
Performance
glabflow delivers 50-100x throughput improvement over traditional sequential GitLab clients for bulk operations:
Self-Hosted GitLab Benchmarks
| Mode | Users/sec | Throughput | Purpose |
|---|---|---|---|
| glabflow DEFAULT (GIL off) | 1207/s | 100-200x sequential client | MAXIMUM SPEED |
| glabflow DEFAULT (GIL on) | 713/s | 50-100x sequential client | HIGH SPEED |
| glabflow SAFE MODE | ~200-300/s | 40-80x sequential client | Production reliability |
| Sequential client (python-gitlab) | 60-80/s | baseline | Reference point |
Benchmark: Streaming 1000 users on code.swecha.org (GitLab 17.5.5) with Python 3.14+ freethreaded
gitlab.com Benchmarks
| Mode | Users/sec | Concurrency | Rate Limit Impact | Purpose |
|---|---|---|---|---|
| glabflow DEFAULT | ~180-220/s | 20 | ✅ Stays inside 600 req/min | Production gitlab.com |
| glabflow SAFE MODE | ~150-180/s | 20 | ✅ Auto-backoff on 429 | Reliable gitlab.com |
| Sequential client | 60-80/s | 1 | ⚠️ Hits limits fast | Reference point |
Benchmark: Streaming users on gitlab.com (GitLab SaaS) with Python 3.14+
Key insight: glabflow on gitlab.com is still 2-3x faster than sequential clients while respecting rate limits. The for_gitlab_com() factory method pre-configures conservative concurrency (20) to stay inside the ~600 req/min budget.
GraphQL Performance
| Operation | Throughput | Notes |
|---|---|---|
| Single query execution | ~50-100ms | With caching: <10ms |
| Batched queries (100) | ~200-500ms | DataLoader prevents N+1 |
| Streaming pagination | ~1000 nodes/s | Automatic cursor management |
| Mutation execution | ~50-100ms | With automatic retry |
Key Optimizations
- Cached msgspec.Decoder - Reuse JSON decoders (+10-20%)
- uvloop - Fast asyncio event loop (+15-25%)
- GIL Disabled - Freethreaded Python 3.14+ (+50-100%)
- DataLoader Batching - Prevents N+1 queries (5-10x fewer requests)
- Result Caching - Sub-millisecond cache hits
- Keyset pagination - Database index seeks (no OFFSET)
- Bounded fan-out - Parallel bulk operations
See: Performance Documentation | GraphQL Benchmarks
Two Modes: SPEED vs RELIABILITY
glabflow provides two modes for different needs:
-
DEFAULT Mode - Zero overhead, maximum speed (DEFAULT)
async with glabflow.Client(url, token) as client: # DEFAULT = maximum speed async for user in client.users.stream(): # 3500+ users/s ...
- ✅ Zero overhead - skips validation, rate limit tracking, error handling
- ✅ Maximum speed - 50-100x sequential client throughput
- ✅ Clean API - simpler than raw aiohttp
- ⚠️ Use on reliable servers - self-hosted GitLab without rate limits, or gitlab.com with conservative concurrency
-
SAFE MODE - Full validation, production reliability
async with glabflow.Client(url, token, safe_mode=True) as client: async for user in client.users.stream(): # Typed objects, ~3000 users/s ...
- ✅ Full error handling - automatic retry on failures
- ✅ Rate limit handling - automatic backoff on 429
- ✅ Type safety - typed objects with validation
- ⚠️ ~15% slower - trade-off for reliability
Why only 2 modes? Because the goal is simple:
- DEFAULT mode → Maximum speed
- SAFE mode → Production reliability
Calculate your savings: Run uv run examples/roi_calculator.py to estimate time and cost savings for your instance.
Why So Much Faster?
| Technology | Benefit | Impact |
|---|---|---|
| aiohttp | Async HTTP with connection pooling | 100 concurrent requests |
| msgspec | Fastest Python JSON library | 3x faster parsing |
| Keyset pagination | Database index seeks (no OFFSET) | 2-5x faster at scale |
| Bounded fan-out | Parallel bulk operations | 50-100x speedup |
| uv | Modern Python tooling | Faster installs, smaller deps |
gitlab.com-Specific Optimizations
| Feature | Benefit | Impact |
|---|---|---|
| Rate limit budgeting | Tracks RateLimit-Remaining headers |
Prevents 429 errors |
| Adaptive concurrency | Auto-tunes based on response times | Optimal throughput |
| Cloudflare 52x handling | Retries transient Cloudflare errors | Resilience on gitlab.com |
| Conservative defaults | 20 concurrent requests (vs 200 self-hosted) | Stays inside 600 req/min |
| Connection pre-warming | Reuses TCP connections across requests | Faster startup, less overhead |
Installation
uv add glabflow
Or with pip: pip install glabflow
We recommend uv for Python project and dependency management.
Quick Start — gitlab.com
import asyncio
import glabflow
async def main():
# Use the pre-configured gitlab.com client (conservative concurrency: 20)
async with glabflow.Client.for_gitlab_com("your-token") as gl:
# Stream all users with automatic rate limit handling
async for user in gl.users.stream():
print(user.username)
asyncio.run(main())
Why for_gitlab_com()? gitlab.com enforces ~600 req/min rate limits. This factory method:
- Sets concurrency to 20 (vs 200 for self-hosted)
- Uses the correct API URL (
https://gitlab.com/api/v4) - Keeps you well inside rate limits while maximizing throughput
For self-hosted instances, use glabflow.Client(url, token) with higher concurrency.
Quick Start — REST API (Bulk Operations)
import asyncio
import glabflow
async def main():
async with glabflow.Client("https://gitlab.example.com", "your-token") as gl:
# Stream all active users
async for user in gl.users.stream():
print(user.username)
asyncio.run(main())
Bulk Fan-out Example
Use fanout to run a coroutine over every item in a stream with bounded concurrency:
import asyncio
import glabflow
from glabflow import fanout
async def get_mr_count(gl: glabflow.Client, user: glabflow.User) -> dict:
count = 0
async for _ in gl.mrs.stream_for_user(user.id, state="merged"):
count += 1
return {"user": user.username, "merged_mrs": count}
async def main():
async with glabflow.Client(
"https://gitlab.example.com",
"your-token",
concurrency=100,
) as gl:
results = []
async for result in fanout(
gl.users.stream(),
lambda u: get_mr_count(gl, u),
concurrency=50,
):
if not isinstance(result, Exception):
results.append(result)
print(f"Processed {len(results)} users")
asyncio.run(main())
API Coverage
✅ 100% GitLab API Coverage!
glabflow covers all 205 GitLab API v4 endpoints (CRUD) across 28 API categories:
- READ (GET): 174 endpoints
- CREATE (POST): 15 endpoints
- UPDATE (PUT/PATCH): 7 endpoints
- DELETE (DELETE): 9 endpoints
Spanning: Users, Projects, Groups, Merge Requests, Issues, Pipelines, CI/CD, Security, and more.
See REST API Guide for complete endpoint list.
Extended APIs
glabflow.extended provides higher-level APIs built on top of the official GitLab API — composite operations, cross-resource queries, and analytics that would require multiple raw API calls.
Installation
uv add glabflow[extended]
# or
pip install glabflow[extended]
Available APIs
| Class | Description |
|---|---|
UsersAPI |
GraphQL-first user lookups, bulk association counts, user-centric issue/MR/project fetching |
ProjectsAPI |
Search, get-by-path, enriched project metadata |
GroupsAPI |
GraphQL group membership queries with per-user member counts |
CommitsAPI |
Concurrent multi-project commit fetching with author matching and time-slot analysis |
AuditAPI |
Repository inventory with branch protection, CI, webhook, and approval rule auditing |
AnalyticsAPI |
Cross-resource analytics and reporting |
analyze_description |
Compliance/quality analysis for MR and issue descriptions |
Quick Example
import asyncio
import glabflow
from glabflow.extended import UsersAPI, CommitsAPI
async def main():
async with glabflow.Client("https://gitlab.example.com", "your-token") as gl:
users = UsersAPI(gl)
commits = CommitsAPI(gl)
# GraphQL-first user lookup — resolves username → full profile
user = await users.get_by_username("alice")
# Bulk association counts across all users in one pass
projects = []
async for p in gl.projects.stream():
projects.append(p)
counts = await users.bulk_associations_count_async([user], projects)
# Concurrent commit fetch across all projects with IST time-slot analysis
all_commits, per_project, stats = await commits.get_user_commits(
user, projects, since="2024-01-01T00:00:00Z"
)
print(f"{stats['total']} commits — {stats['morning_commits']} morning, {stats['afternoon_commits']} afternoon")
asyncio.run(main())
Audit Example
from glabflow.extended import AuditAPI
async with glabflow.Client(url, token) as gl:
audit = AuditAPI(gl)
# Full repo inventory with protection and CI status
repos = await audit.repo_inventory(group_id=42)
# Compliance report against custom rules
report = await audit.compliance_report(group_id=42)
for violation in report.violations:
print(f"{violation.repo}: {violation.rule} — {violation.detail}")
These APIs are not exported from the top-level glabflow namespace — import explicitly from glabflow.extended.
Error Handling
import glabflow
async with glabflow.Client("https://gitlab.example.com", token) as gl:
try:
user = await gl.users.get(999999)
except glabflow.NotFoundError:
print("User not found")
except glabflow.RateLimitError:
print("Rate limited — reduce concurrency")
except glabflow.TransientError:
print("Transient error (409 Resource Lock or 52x) — auto-retried")
See Error Handling Documentation for details.
Advanced Configuration
Environment Variables
# Reads GITLAB_URL and GITLAB_TOKEN from environment
gl = glabflow.Client.from_env()
# Or with custom variable names
gl = glabflow.Client.from_env(url_env="MY_GITLAB_URL", token_env="MY_TOKEN")
Custom User-Agent
async with glabflow.Client(url, token, user_agent="my-tool/1.0") as gl:
...
Sudo / Impersonation
# Act on behalf of another user (admin only)
async with glabflow.Client(url, token, sudo="target-username") as gl:
...
Automatic Retry
REST calls automatically retry transient errors (409 Resource Lock, 500/502/503/504, Cloudflare 52x) with exponential backoff — no configuration needed.
Requirements
- Python 3.14+ (free-threaded / no-GIL recommended)
- Dependencies:
aiohttp>=3.10,msgspec>=0.18,stamina>=24.2
CLI (Command-Line Interface)
glabflow includes a high-performance CLI for bulk GitLab repository operations — built for speed and scale with 5-10x better performance than gitlabber.
Installation
# With uv (recommended)
uv tool install glabflow[cli]
# With pip
pip install glabflow[cli]
Quick Start
# Store credentials once
gl auth login glpat-xxxxxxxxxxxxxxxxxxxx # pragma: allowlist secret
# List repositories (no token needed — reads stored credentials)
gl list -g group --format tree
# Clone all repos from a group
gl git clone -g mygroup ./clones
# Clone with filtering
gl git clone -g group --include "*/backend/*" --exclude "*/archive/*" ./clones
# Pull changes in all cloned repos
gl git pull ./clones
Commands
| Command | Description |
|---|---|
auth login |
Store GitLab credentials (secure file, 0600 perms) |
auth whoami |
Verify token and show details |
auth profiles |
List stored credential profiles |
list |
List repositories, users, or groups |
git clone |
Clone repositories from a GitLab group |
git pull |
Pull changes in all cloned repositories |
git fetch |
Fetch changes (no merge) in all cloned repositories |
audit |
Repository audit and compliance dashboards |
analytics |
Team analytics and DORA metrics dashboards |
spamcheck |
Spam user detection and remediation |
anomaly |
Access anomaly detection and security audit |
Enterprise Compliance & Audit
Automate SOC2 Type II and ISO 27001:2022 compliance evidence collection — reduce audit preparation from weeks to hours.
# Generate SOC2 evidence packs for all controls
gl audit compliance -g myorg --soc2-evidence ./soc2_evidence/
# Generate ISO 27001 evidence packs
gl audit compliance -g myorg --iso27001-evidence ./iso27001_evidence/
# Generate both frameworks in one command
gl audit compliance -g myorg \
--soc2-evidence ./soc2_evidence/ \
--iso27001-evidence ./iso27001_evidence/
# Continuous compliance monitoring (add to cron)
0 9 * * 1 gl audit compliance -g myorg --format json \
--audit-log /var/log/glabflow/weekly.jsonl
Compliance Features:
- ✅ 9 SOC2 Type II controls mapped (CC6.1, CC6.2, CC6.3, CC6.6, CC7.1, CC7.2, CC7.3, CC8.1, A1.1)
- ✅ 9 ISO 27001:2022 controls mapped (A.5.8, A.5.10, A.5.15, A.5.23, A.8.9, A.8.16, A.8.20, A.8.28, A.8.35)
- ✅ 8 standard compliance rules enforced (branch protection, MR approvals, CI/CD, webhooks, etc.)
- ✅ 7 anomaly types detected (orphaned accounts, privilege escalation, stale admin, etc.)
- ✅ Structured JSONL audit logs with SHA-256 integrity and ISO 8601 timestamps
- ✅ Auditor-ready evidence packs — one JSON file per control with findings, violations, and audit trail
Performance: 50-100x faster than manual evidence collection (247 repos audited in 12 seconds)
See: COMPLIANCE.md for full control mappings and auditor guidance | examples/compliance/ for code examples
Authentication
All commands resolve credentials automatically — no need to pass --token every time.
# Store credentials (creates ~/.config/glabflow/config.json)
gl auth login glpat-xxxxxxxxxxxxxxxxxxxx # pragma: allowlist secret
# For self-hosted instances
gl auth login glpat-xxx -u https://gitlab.example.com
# Named profiles
gl auth login glpat-xxx -p work --default
gl auth use work
# Verify stored credentials
gl auth whoami
Resolution priority: stored profiles → GITLAB_TOKEN env var.
CLI Options
list command:
gl list -g group --format json # JSON output
gl list -g group --format tree # Tree output
gl list -g group # Text output (default)
gl list -g group --quiet # Machine-readable NDJSON (agent-friendly)
gl list -g group --audit-log /var/log/glabflow/audit.jsonl # Enterprise audit
git clone command:
gl git clone -g group ./destination \
--concurrency 10 \ # Git operations concurrency
--api-concurrency 100 \ # API requests concurrency
--depth 1 \ # Shallow clone depth
--submodules \ # Clone submodules
--no-namespace \ # Flat directory structure
--include "*/backend/*" \ # Include patterns
--exclude "*/archive/*" # Exclude patterns
--dry-run \ # Preview without cloning
--quiet \ # NDJSON progress output
--audit-log audit.jsonl # Structured audit log
git pull / git fetch commands:
gl git pull ./clones --dry-run # Preview without pulling
gl git fetch ./clones --quiet # Machine-readable output
Trust & Safety Commands
Spam detection:
gl spamcheck users --min-score 7 --reasons
gl spamcheck users --format csv -o spam_scores.csv
gl spamcheck actions --min-score 7 --dry-run
gl spamcheck snippets -o spam_snippets.json
Access anomaly detection:
gl anomaly scan
gl anomaly scan --type orphaned_account --type stale_admin
gl anomaly scan -o anomaly_results.csv
gl anomaly actions --min-score 7 --dry-run
Enterprise Audit Mode
All CLI commands support --audit-log PATH for compliance-grade structured logging:
gl git clone -t TOKEN -g myorg ./clones --audit-log /var/log/glabflow/audit.jsonl
Audit entries are written as JSONL (one JSON object per line) with:
- Timestamps — ISO 8601 UTC for every action
- Token hashing — SHA-256 (first 16 chars), never logged in plaintext
- Operation details — discovered/cloned/failed counts, duration, failure reasons
- Session boundaries —
audit_startandaudit_endmarkers - Dry-run tracking — preview actions are also logged for compliance review
Example audit entry:
{"event":"clone_complete","discovered":247,"cloned":245,"failed":2,"total_time":142.5,"failures":[{"path":"myorg/backend/legacy","error":"access denied"}]}
Agent-Friendly Mode
Use --quiet (-q) for machine-readable NDJSON output — no Rich formatting, no ANSI codes:
# Stream NDJSON events to stdout (ideal for piping to jq or log aggregators)
gl git clone -t TOKEN -g mygroup ./clones --quiet | jq 'select(.event == "fail")'
Events include: discovered, progress, ok, fail, complete, error.
Environment Variables
| Variable | Description |
|---|---|
GITLAB_TOKEN |
GitLab private token (fallback if no stored profile) |
GITLAB_URL |
GitLab instance URL (default: https://gitlab.com) |
Comparison with gitlabber
The CLI provides compatible functionality with gitlabber with significantly better performance:
| gitlabber | glabflow CLI | Notes |
|---|---|---|
-t TOKEN |
-t TOKEN |
Same |
-u URL |
-u URL |
Same |
-g GROUP |
-g GROUP |
Same |
-i PATTERN |
--include PATTERN |
Similar |
-e PATTERN |
--exclude PATTERN |
Similar |
-p tree |
--format tree |
Similar |
See CLI Benchmarks for detailed performance comparisons.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file glabflow-0.1.0a5.tar.gz.
File metadata
- Download URL: glabflow-0.1.0a5.tar.gz
- Upload date:
- Size: 247.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b7f39bf1de6fda59978a961d4abaef40899382e0d0c98c93660564cb135fa512
|
|
| MD5 |
660773e2a506b3fc61c639366df63eb6
|
|
| BLAKE2b-256 |
e16024a5df649394893052b62d45b70219aeff2baf80b7aca161b5dd0bdb7a14
|
File details
Details for the file glabflow-0.1.0a5-py3-none-any.whl.
File metadata
- Download URL: glabflow-0.1.0a5-py3-none-any.whl
- Upload date:
- Size: 352.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bdf3b32b901dccb3404f8e31cec6e9c9874b2807e0cacdf28920c378e288df74
|
|
| MD5 |
f4b99244e41a5bf80915d0316be6d51c
|
|
| BLAKE2b-256 |
a528c4dc8a98d33dd3427d085f6e8e3b9bedea6720f88abc58c2a823ecf1a3fc
|