Skip to main content

Async-native Python library for bulk operations on self-hosted GitLab

Project description

glabflow

The canonical GitLab client for AI agent toolkits and high-performance automation.

Built for two use cases:

  1. AI Agents — First-party tool definitions for Claude, LangChain, smolagents
  2. High-Volume Automation — GraphQL-first access, bulk operations at scale

Primary Design Goals:

  • Agent-Native — Async generators (streaming), NDJSON output, TypedDicts
  • 100% GitLab API Coverage — 205 REST endpoints (CRUD), 143+ GraphQL queries, 75+ mutations
  • Maximum Speedaiohttp for HTTP, msgspec for JSON, keyset pagination

PyPI PyPI Downloads Python Version License Pipeline Status Code Style: Ruff Coverage pre-commit Conventional Commits Security: bandit Dependency Check: safety Type Checked: pyrefly Dead Code: vulture Complexity: radon uv Docs: MkDocs Material DX Score Hooks CI Checks


AI Agent SDK — Agents, Meet GitLab

from glabflow.tools import get_tools

# Get Claude Tool Use compatible tools
tools = await get_tools(client, framework="claude")

# Or LangChain, smolagents
tools = await get_tools(client, framework="langchain")
tools = await get_tools(client, framework="smolagents")

Why glabflow for Agents?

Feature Agent Benefit
Async generators Stream results incrementally
NDJSON output Parse one line at a time
TypedDicts Schema-grounded, not stringly-typed
Tool definitions Drop-in for agent frameworks

High-Performance Automation

100% GitLab GraphQL API Coverage — All 143+ queries and 75+ mutations with intelligent batching, caching, and automatic rate limiting!

Quick Start — GraphQL

import asyncio
import glabflow

async def main():
    async with glabflow.Client("https://gitlab.example.com", "your-token") as gl:
        # Execute a pre-built query
        result = await gl.graphql.execute(
            gl.graphql.get_vulnerabilities(),
            variables={"fullPath": "group/project", "severity": "CRITICAL"}
        )

        # Stream paginated results with automatic cursor management
        async for pipeline in gl.graphql.stream(
            gl.graphql.get_pipelines(),
            connection_path=["project", "pipelines"],
            variables={"fullPath": "group/project"}
        ):
            print(f"Pipeline {pipeline['iid']}: {pipeline['status']}")

asyncio.run(main())

GraphQL Features

Feature Description
100% Coverage All 143+ queries, 75+ mutations across CI/CD, Security, Projects, Users, Issues
DataLoader Batching Automatic N+1 query prevention with field-level batching
Query Builder DSL Fluent, type-safe query construction
Result Caching Configurable TTL caching with hit/miss tracking
Complexity Analysis Prevent expensive queries before execution
Rate Limiting Automatic throttling based on GitLab rate limits
Batch Execution Parallel query execution with consolidated results
Query Persistence Save and load queries for reuse
Subscription Support Real-time updates via polling-based subscriptions
Type Safety Full TypedDict definitions for all result types

Advanced GraphQL Example

import asyncio
import glabflow
from glabflow.graphql import Query, DataLoader

async def main():
    async with glabflow.Client("https://gitlab.example.com", "your-token") as gl:
        # Use the query builder DSL
        q = gl.graphql.query("GetProject") \
            .arg("fullPath", "ID!") \
            .field("project", args={"fullPath": "$fullPath"}) \
                .field("id") \
                .field("name") \
                .field("openIssuesCount") \
            .end()

        result = await gl.graphql.execute(q, variables={"fullPath": "group/project"})
        print(result["project"]["name"])

        # Batch multiple queries to prevent N+1
        loader = DataLoader(gl.graphql, max_batch_size=100)
        projects = await loader.load_many(
            [("project", {"fullPath": path}) for path in ["group/proj1", "group/proj2"]]
        )

        # Use pre-built mutations
        result = await gl.graphql.execute(
            gl.graphql.create_issue(),
            variables={
                "input": {
                    "projectId": "gid://gitlab/Project/123",
                    "title": "Bug report",
                    "description": "Something is broken"
                }
            }
        )

asyncio.run(main())

See GraphQL Quick Reference for complete usage guide.


Performance

glabflow delivers 50-100x throughput improvement over traditional sequential GitLab clients for bulk operations:

Self-Hosted GitLab Benchmarks

Mode Users/sec Throughput Purpose
glabflow DEFAULT (GIL off) 1207/s 100-200x sequential client MAXIMUM SPEED
glabflow DEFAULT (GIL on) 713/s 50-100x sequential client HIGH SPEED
glabflow SAFE MODE ~200-300/s 40-80x sequential client Production reliability
Sequential client (python-gitlab) 60-80/s baseline Reference point

Benchmark: Streaming 1000 users on code.swecha.org (GitLab 17.5.5) with Python 3.14+ freethreaded

gitlab.com Benchmarks

Mode Users/sec Concurrency Rate Limit Impact Purpose
glabflow DEFAULT ~180-220/s 20 ✅ Stays inside 600 req/min Production gitlab.com
glabflow SAFE MODE ~150-180/s 20 ✅ Auto-backoff on 429 Reliable gitlab.com
Sequential client 60-80/s 1 ⚠️ Hits limits fast Reference point

Benchmark: Streaming users on gitlab.com (GitLab SaaS) with Python 3.14+

Key insight: glabflow on gitlab.com is still 2-3x faster than sequential clients while respecting rate limits. The for_gitlab_com() factory method pre-configures conservative concurrency (20) to stay inside the ~600 req/min budget.

GraphQL Performance

Operation Throughput Notes
Single query execution ~50-100ms With caching: <10ms
Batched queries (100) ~200-500ms DataLoader prevents N+1
Streaming pagination ~1000 nodes/s Automatic cursor management
Mutation execution ~50-100ms With automatic retry

Key Optimizations

  1. Cached msgspec.Decoder - Reuse JSON decoders (+10-20%)
  2. uvloop - Fast asyncio event loop (+15-25%)
  3. GIL Disabled - Freethreaded Python 3.14+ (+50-100%)
  4. DataLoader Batching - Prevents N+1 queries (5-10x fewer requests)
  5. Result Caching - Sub-millisecond cache hits
  6. Keyset pagination - Database index seeks (no OFFSET)
  7. Bounded fan-out - Parallel bulk operations

See: Performance Documentation | GraphQL Benchmarks

Two Modes: SPEED vs RELIABILITY

glabflow provides two modes for different needs:

  1. DEFAULT Mode - Zero overhead, maximum speed (DEFAULT)

    async with glabflow.Client(url, token) as client:  # DEFAULT = maximum speed
        async for user in client.users.stream():  # 3500+ users/s
            ...
    
    • Zero overhead - skips validation, rate limit tracking, error handling
    • Maximum speed - 50-100x sequential client throughput
    • Clean API - simpler than raw aiohttp
    • ⚠️ Use on reliable servers - self-hosted GitLab without rate limits, or gitlab.com with conservative concurrency
  2. SAFE MODE - Full validation, production reliability

    async with glabflow.Client(url, token, safe_mode=True) as client:
        async for user in client.users.stream():  # Typed objects, ~3000 users/s
            ...
    
    • Full error handling - automatic retry on failures
    • Rate limit handling - automatic backoff on 429
    • Type safety - typed objects with validation
    • ⚠️ ~15% slower - trade-off for reliability

Why only 2 modes? Because the goal is simple:

  • DEFAULT mode → Maximum speed
  • SAFE mode → Production reliability

Calculate your savings: Run uv run examples/roi_calculator.py to estimate time and cost savings for your instance.

Why So Much Faster?

Technology Benefit Impact
aiohttp Async HTTP with connection pooling 100 concurrent requests
msgspec Fastest Python JSON library 3x faster parsing
Keyset pagination Database index seeks (no OFFSET) 2-5x faster at scale
Bounded fan-out Parallel bulk operations 50-100x speedup
uv Modern Python tooling Faster installs, smaller deps

gitlab.com-Specific Optimizations

Feature Benefit Impact
Rate limit budgeting Tracks RateLimit-Remaining headers Prevents 429 errors
Adaptive concurrency Auto-tunes based on response times Optimal throughput
Cloudflare 52x handling Retries transient Cloudflare errors Resilience on gitlab.com
Conservative defaults 20 concurrent requests (vs 200 self-hosted) Stays inside 600 req/min
Connection pre-warming Reuses TCP connections across requests Faster startup, less overhead

Installation

uv add glabflow

Or with pip: pip install glabflow

We recommend uv for Python project and dependency management.

Quick Start — gitlab.com

import asyncio
import glabflow

async def main():
    # Use the pre-configured gitlab.com client (conservative concurrency: 20)
    async with glabflow.Client.for_gitlab_com("your-token") as gl:
        # Stream all users with automatic rate limit handling
        async for user in gl.users.stream():
            print(user.username)

asyncio.run(main())

Why for_gitlab_com()? gitlab.com enforces ~600 req/min rate limits. This factory method:

  • Sets concurrency to 20 (vs 200 for self-hosted)
  • Uses the correct API URL (https://gitlab.com/api/v4)
  • Keeps you well inside rate limits while maximizing throughput

For self-hosted instances, use glabflow.Client(url, token) with higher concurrency.

Quick Start — REST API (Bulk Operations)

import asyncio
import glabflow

async def main():
    async with glabflow.Client("https://gitlab.example.com", "your-token") as gl:
        # Stream all active users
        async for user in gl.users.stream():
            print(user.username)

asyncio.run(main())

Bulk Fan-out Example

Use fanout to run a coroutine over every item in a stream with bounded concurrency:

import asyncio
import glabflow
from glabflow import fanout

async def get_mr_count(gl: glabflow.Client, user: glabflow.User) -> dict:
    count = 0
    async for _ in gl.mrs.stream_for_user(user.id, state="merged"):
        count += 1
    return {"user": user.username, "merged_mrs": count}

async def main():
    async with glabflow.Client(
        "https://gitlab.example.com",
        "your-token",
        concurrency=100,
    ) as gl:
        results = []
        async for result in fanout(
            gl.users.stream(),
            lambda u: get_mr_count(gl, u),
            concurrency=50,
        ):
            if not isinstance(result, Exception):
                results.append(result)

    print(f"Processed {len(results)} users")

asyncio.run(main())

API Coverage

✅ 100% GitLab API Coverage!

glabflow covers all 205 GitLab API v4 endpoints (CRUD) across 28 API categories:

  • READ (GET): 174 endpoints
  • CREATE (POST): 15 endpoints
  • UPDATE (PUT/PATCH): 7 endpoints
  • DELETE (DELETE): 9 endpoints

Spanning: Users, Projects, Groups, Merge Requests, Issues, Pipelines, CI/CD, Security, and more.

See REST API Guide for complete endpoint list.

Extended APIs

glabflow.extended provides higher-level APIs built on top of the official GitLab API — composite operations, cross-resource queries, and analytics that would require multiple raw API calls.

Installation

uv add glabflow[extended]
# or
pip install glabflow[extended]

Available APIs

Class Description
UsersAPI GraphQL-first user lookups, bulk association counts, user-centric issue/MR/project fetching
ProjectsAPI Search, get-by-path, enriched project metadata
GroupsAPI GraphQL group membership queries with per-user member counts
CommitsAPI Concurrent multi-project commit fetching with author matching and time-slot analysis
AuditAPI Repository inventory with branch protection, CI, webhook, and approval rule auditing
AnalyticsAPI Cross-resource analytics and reporting
analyze_description Compliance/quality analysis for MR and issue descriptions

Quick Example

import asyncio
import glabflow
from glabflow.extended import UsersAPI, CommitsAPI

async def main():
    async with glabflow.Client("https://gitlab.example.com", "your-token") as gl:
        users = UsersAPI(gl)
        commits = CommitsAPI(gl)

        # GraphQL-first user lookup — resolves username → full profile
        user = await users.get_by_username("alice")

        # Bulk association counts across all users in one pass
        projects = []
        async for p in gl.projects.stream():
            projects.append(p)

        counts = await users.bulk_associations_count_async([user], projects)

        # Concurrent commit fetch across all projects with IST time-slot analysis
        all_commits, per_project, stats = await commits.get_user_commits(
            user, projects, since="2024-01-01T00:00:00Z"
        )
        print(f"{stats['total']} commits — {stats['morning_commits']} morning, {stats['afternoon_commits']} afternoon")

asyncio.run(main())

Audit Example

from glabflow.extended import AuditAPI

async with glabflow.Client(url, token) as gl:
    audit = AuditAPI(gl)

    # Full repo inventory with protection and CI status
    repos = await audit.repo_inventory(group_id=42)

    # Compliance report against custom rules
    report = await audit.compliance_report(group_id=42)
    for violation in report.violations:
        print(f"{violation.repo}: {violation.rule}{violation.detail}")

These APIs are not exported from the top-level glabflow namespace — import explicitly from glabflow.extended.


Error Handling

import glabflow

async with glabflow.Client("https://gitlab.example.com", token) as gl:
    try:
        user = await gl.users.get(999999)
    except glabflow.NotFoundError:
        print("User not found")
    except glabflow.RateLimitError:
        print("Rate limited — reduce concurrency")
    except glabflow.TransientError:
        print("Transient error (409 Resource Lock or 52x) — auto-retried")

See Error Handling Documentation for details.

Advanced Configuration

Environment Variables

# Reads GITLAB_URL and GITLAB_TOKEN from environment
gl = glabflow.Client.from_env()

# Or with custom variable names
gl = glabflow.Client.from_env(url_env="MY_GITLAB_URL", token_env="MY_TOKEN")

Custom User-Agent

async with glabflow.Client(url, token, user_agent="my-tool/1.0") as gl:
    ...

Sudo / Impersonation

# Act on behalf of another user (admin only)
async with glabflow.Client(url, token, sudo="target-username") as gl:
    ...

Automatic Retry

REST calls automatically retry transient errors (409 Resource Lock, 500/502/503/504, Cloudflare 52x) with exponential backoff — no configuration needed.

Requirements

  • Python 3.14+ (free-threaded / no-GIL recommended)
  • Dependencies: aiohttp>=3.10, msgspec>=0.18, stamina>=24.2

CLI (Command-Line Interface)

glabflow includes a high-performance CLI for bulk GitLab repository operations — built for speed and scale with 5-10x better performance than gitlabber.

Installation

# With uv (recommended)
uv tool install glabflow[cli]

# With pip
pip install glabflow[cli]

Quick Start

# Store credentials once
gl auth login glpat-xxxxxxxxxxxxxxxxxxxx  # pragma: allowlist secret

# List repositories (no token needed — reads stored credentials)
gl list -g group --format tree

# Clone all repos from a group
gl git clone -g mygroup ./clones

# Clone with filtering
gl git clone -g group --include "*/backend/*" --exclude "*/archive/*" ./clones

# Pull changes in all cloned repos
gl git pull ./clones

Commands

Command Description
auth login Store GitLab credentials (secure file, 0600 perms)
auth whoami Verify token and show details
auth profiles List stored credential profiles
list List repositories, users, or groups
git clone Clone repositories from a GitLab group
git pull Pull changes in all cloned repositories
git fetch Fetch changes (no merge) in all cloned repositories
audit Repository audit and compliance dashboards
analytics Team analytics and DORA metrics dashboards
spamcheck Spam user detection and remediation
anomaly Access anomaly detection and security audit

Enterprise Compliance & Audit

Automate SOC2 Type II and ISO 27001:2022 compliance evidence collection — reduce audit preparation from weeks to hours.

# Generate SOC2 evidence packs for all controls
gl audit compliance -g myorg --soc2-evidence ./soc2_evidence/

# Generate ISO 27001 evidence packs
gl audit compliance -g myorg --iso27001-evidence ./iso27001_evidence/

# Generate both frameworks in one command
gl audit compliance -g myorg \
    --soc2-evidence ./soc2_evidence/ \
    --iso27001-evidence ./iso27001_evidence/

# Continuous compliance monitoring (add to cron)
0 9 * * 1 gl audit compliance -g myorg --format json \
    --audit-log /var/log/glabflow/weekly.jsonl

Compliance Features:

  • 9 SOC2 Type II controls mapped (CC6.1, CC6.2, CC6.3, CC6.6, CC7.1, CC7.2, CC7.3, CC8.1, A1.1)
  • 9 ISO 27001:2022 controls mapped (A.5.8, A.5.10, A.5.15, A.5.23, A.8.9, A.8.16, A.8.20, A.8.28, A.8.35)
  • 8 standard compliance rules enforced (branch protection, MR approvals, CI/CD, webhooks, etc.)
  • 7 anomaly types detected (orphaned accounts, privilege escalation, stale admin, etc.)
  • Structured JSONL audit logs with SHA-256 integrity and ISO 8601 timestamps
  • Auditor-ready evidence packs — one JSON file per control with findings, violations, and audit trail

Performance: 50-100x faster than manual evidence collection (247 repos audited in 12 seconds)

See: COMPLIANCE.md for full control mappings and auditor guidance | examples/compliance/ for code examples

Authentication

All commands resolve credentials automatically — no need to pass --token every time.

# Store credentials (creates ~/.config/glabflow/config.json)
gl auth login glpat-xxxxxxxxxxxxxxxxxxxx  # pragma: allowlist secret

# For self-hosted instances
gl auth login glpat-xxx -u https://gitlab.example.com

# Named profiles
gl auth login glpat-xxx -p work --default
gl auth use work

# Verify stored credentials
gl auth whoami

Resolution priority: stored profiles → GITLAB_TOKEN env var.

CLI Options

list command:

gl list -g group --format json   # JSON output
gl list -g group --format tree   # Tree output
gl list -g group                 # Text output (default)
gl list -g group --quiet         # Machine-readable NDJSON (agent-friendly)
gl list -g group --audit-log /var/log/glabflow/audit.jsonl  # Enterprise audit

git clone command:

gl git clone -g group ./destination \
    --concurrency 10 \        # Git operations concurrency
    --api-concurrency 100 \   # API requests concurrency
    --depth 1 \               # Shallow clone depth
    --submodules \            # Clone submodules
    --no-namespace \          # Flat directory structure
    --include "*/backend/*" \ # Include patterns
    --exclude "*/archive/*"   # Exclude patterns
    --dry-run \               # Preview without cloning
    --quiet \                 # NDJSON progress output
    --audit-log audit.jsonl   # Structured audit log

git pull / git fetch commands:

gl git pull ./clones --dry-run    # Preview without pulling
gl git fetch ./clones --quiet     # Machine-readable output

Trust & Safety Commands

Spam detection:

gl spamcheck users --min-score 7 --reasons
gl spamcheck users --format csv -o spam_scores.csv
gl spamcheck actions --min-score 7 --dry-run
gl spamcheck snippets -o spam_snippets.json

Access anomaly detection:

gl anomaly scan
gl anomaly scan --type orphaned_account --type stale_admin
gl anomaly scan -o anomaly_results.csv
gl anomaly actions --min-score 7 --dry-run

Enterprise Audit Mode

All CLI commands support --audit-log PATH for compliance-grade structured logging:

gl git clone -t TOKEN -g myorg ./clones --audit-log /var/log/glabflow/audit.jsonl

Audit entries are written as JSONL (one JSON object per line) with:

  • Timestamps — ISO 8601 UTC for every action
  • Token hashing — SHA-256 (first 16 chars), never logged in plaintext
  • Operation details — discovered/cloned/failed counts, duration, failure reasons
  • Session boundariesaudit_start and audit_end markers
  • Dry-run tracking — preview actions are also logged for compliance review

Example audit entry:

{"event":"clone_complete","discovered":247,"cloned":245,"failed":2,"total_time":142.5,"failures":[{"path":"myorg/backend/legacy","error":"access denied"}]}

Agent-Friendly Mode

Use --quiet (-q) for machine-readable NDJSON output — no Rich formatting, no ANSI codes:

# Stream NDJSON events to stdout (ideal for piping to jq or log aggregators)
gl git clone -t TOKEN -g mygroup ./clones --quiet | jq 'select(.event == "fail")'

Events include: discovered, progress, ok, fail, complete, error.

Environment Variables

Variable Description
GITLAB_TOKEN GitLab private token (fallback if no stored profile)
GITLAB_URL GitLab instance URL (default: https://gitlab.com)

Comparison with gitlabber

The CLI provides compatible functionality with gitlabber with significantly better performance:

gitlabber glabflow CLI Notes
-t TOKEN -t TOKEN Same
-u URL -u URL Same
-g GROUP -g GROUP Same
-i PATTERN --include PATTERN Similar
-e PATTERN --exclude PATTERN Similar
-p tree --format tree Similar

See CLI Benchmarks for detailed performance comparisons.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

glabflow-0.1.0a5.tar.gz (247.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

glabflow-0.1.0a5-py3-none-any.whl (352.1 kB view details)

Uploaded Python 3

File details

Details for the file glabflow-0.1.0a5.tar.gz.

File metadata

  • Download URL: glabflow-0.1.0a5.tar.gz
  • Upload date:
  • Size: 247.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for glabflow-0.1.0a5.tar.gz
Algorithm Hash digest
SHA256 b7f39bf1de6fda59978a961d4abaef40899382e0d0c98c93660564cb135fa512
MD5 660773e2a506b3fc61c639366df63eb6
BLAKE2b-256 e16024a5df649394893052b62d45b70219aeff2baf80b7aca161b5dd0bdb7a14

See more details on using hashes here.

File details

Details for the file glabflow-0.1.0a5-py3-none-any.whl.

File metadata

  • Download URL: glabflow-0.1.0a5-py3-none-any.whl
  • Upload date:
  • Size: 352.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for glabflow-0.1.0a5-py3-none-any.whl
Algorithm Hash digest
SHA256 bdf3b32b901dccb3404f8e31cec6e9c9874b2807e0cacdf28920c378e288df74
MD5 f4b99244e41a5bf80915d0316be6d51c
BLAKE2b-256 a528c4dc8a98d33dd3427d085f6e8e3b9bedea6720f88abc58c2a823ecf1a3fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page