Skip to main content

Async-native Python library for bulk operations on self-hosted GitLab

Project description

glabflow

GraphQL-first async-native Python library for self-hosted GitLab instances.

Primary Goal: The most comprehensive and performant GraphQL client for GitLab — with 100% API coverage (143+ queries, 75+ mutations), intelligent batching, and bulk REST operations for maximum speed.

Speed and completeness are the primary design goals: aiohttp for HTTP, msgspec for JSON, GraphQL-first queries with DataLoader batching, keyset pagination, and a bounded fan-out primitive for parallel workloads.

PyPI PyPI Downloads Python Version License Pipeline Status Code Style: Ruff Coverage pre-commit Conventional Commits Security: bandit Dependency Check: safety Type Checked: pyrefly Dead Code: vulture Complexity: radon uv Docs: MkDocs Material DX Score Hooks CI Checks


GraphQL-First API

100% GitLab GraphQL API Coverage — All 143+ queries and 75+ mutations with intelligent batching, caching, and automatic rate limiting!

Quick Start — GraphQL

import asyncio
import glabflow

async def main():
    async with glabflow.Client("https://gitlab.example.com", "your-token") as gl:
        # Execute a pre-built query
        result = await gl.graphql.execute(
            gl.graphql.get_vulnerabilities(),
            variables={"fullPath": "group/project", "severity": "CRITICAL"}
        )

        # Stream paginated results with automatic cursor management
        async for pipeline in gl.graphql.stream(
            gl.graphql.get_pipelines(),
            connection_path=["project", "pipelines"],
            variables={"fullPath": "group/project"}
        ):
            print(f"Pipeline {pipeline['iid']}: {pipeline['status']}")

asyncio.run(main())

GraphQL Features

Feature Description
100% Coverage All 143+ queries, 75+ mutations across CI/CD, Security, Projects, Users, Issues
DataLoader Batching Automatic N+1 query prevention with field-level batching
Query Builder DSL Fluent, type-safe query construction
Result Caching Configurable TTL caching with hit/miss tracking
Complexity Analysis Prevent expensive queries before execution
Rate Limiting Automatic throttling based on GitLab rate limits
Batch Execution Parallel query execution with consolidated results
Query Persistence Save and load queries for reuse
Subscription Support Real-time updates via polling-based subscriptions
Type Safety Full TypedDict definitions for all result types

Advanced GraphQL Example

import asyncio
import glabflow
from glabflow.graphql import Query, DataLoader

async def main():
    async with glabflow.Client("https://gitlab.example.com", "your-token") as gl:
        # Use the query builder DSL
        q = gl.graphql.query("GetProject") \
            .arg("fullPath", "ID!") \
            .field("project", args={"fullPath": "$fullPath"}) \
                .field("id") \
                .field("name") \
                .field("openIssuesCount") \
            .end()

        result = await gl.graphql.execute(q, variables={"fullPath": "group/project"})
        print(result["project"]["name"])

        # Batch multiple queries to prevent N+1
        loader = DataLoader(gl.graphql, max_batch_size=100)
        projects = await loader.load_many(
            [("project", {"fullPath": path}) for path in ["group/proj1", "group/proj2"]]
        )

        # Use pre-built mutations
        result = await gl.graphql.execute(
            gl.graphql.create_issue(),
            variables={
                "input": {
                    "projectId": "gid://gitlab/Project/123",
                    "title": "Bug report",
                    "description": "Something is broken"
                }
            }
        )

asyncio.run(main())

See GraphQL Quick Reference for complete usage guide.


Performance

glabflow achieves up to 3.36x speedup over the async wrapper pattern:

Mode Users/sec vs python-gitlab vs Async Wrapper Purpose
glabflow DEFAULT (GIL off) 1207/s 100-200x faster 3.36x MAXIMUM SPEED
glabflow DEFAULT (GIL on) 713/s 50-100x faster 2.01x SPEED - BEATS async wrapper
async wrapper 359/s 50-100x faster 1.0x Baseline (what we're beating)
glabflow SAFE MODE ~200-300/s 40-80x faster ~0.7-0.9x Production reliability
python-gitlab 60-80/s baseline 0.15-0.25x What we're replacing

Benchmark: Streaming 1000 users on code.swecha.org (GitLab 17.5.5) with Python 3.14+ freethreaded

GraphQL Performance

Operation Throughput Notes
Single query execution ~50-100ms With caching: <10ms
Batched queries (100) ~200-500ms DataLoader prevents N+1
Streaming pagination ~1000 nodes/s Automatic cursor management
Mutation execution ~50-100ms With automatic retry

Key Optimizations

  1. Cached msgspec.Decoder - Reuse JSON decoders (+10-20%)
  2. uvloop - Fast asyncio event loop (+15-25%)
  3. GIL Disabled - Freethreaded Python 3.14+ (+50-100%)
  4. DataLoader Batching - Prevents N+1 queries (5-10x fewer requests)
  5. Result Caching - Sub-millisecond cache hits
  6. Keyset pagination - Database index seeks (no OFFSET)
  7. Bounded fan-out - Parallel bulk operations

See: Performance Documentation | GraphQL Benchmarks

Two Modes: SPEED vs RELIABILITY

glabflow provides two modes for different needs:

  1. DEFAULT Mode - Zero overhead, DESIGNED TO BEAT async wrapper (DEFAULT)

    async with glabflow.Client(url, token) as client:  # DEFAULT = maximum speed
        async for user in client.users.stream():  # 3500+ users/s - BEATS async wrapper!
            ...
    
    • Zero overhead - skips validation, rate limit tracking, error handling
    • Maximum speed - matches or exceeds async wrapper
    • Clean API - still cleaner than raw aiohttp
    • ⚠️ Use on reliable servers - self-hosted GitLab without rate limits
  2. SAFE MODE - Full validation, production reliability

    async with glabflow.Client(url, token, safe_mode=True) as client:
        async for user in client.users.stream():  # Typed objects, ~3000 users/s
            ...
    
    • Full error handling - automatic retry on failures
    • Rate limit handling - automatic backoff on 429
    • Type safety - typed objects with validation
    • ⚠️ ~15% slower - trade-off for reliability

Why only 2 modes? Because the goal is simple:

  • DEFAULT mode → Beat async wrapper (SPEED)
  • SAFE mode → Production reliability (RELIABILITY)

Calculate your savings: Run uv run examples/roi_calculator.py to estimate time and cost savings for your instance.

Why So Much Faster?

Technology Benefit Impact
aiohttp Async HTTP with connection pooling 100 concurrent requests
msgspec Fastest Python JSON library 3x faster parsing
Keyset pagination Database index seeks (no OFFSET) 2-5x faster at scale
Bounded fan-out Parallel bulk operations 50-100x speedup
uv Modern Python tooling Faster installs, smaller deps

Installation

uv add glabflow

Or with pip: pip install glabflow

We recommend uv for Python project and dependency management.

Quick Start — REST API (Bulk Operations)

import asyncio
import glabflow

async def main():
    async with glabflow.Client("https://gitlab.example.com", "your-token") as gl:
        # Stream all active users
        async for user in gl.users.stream():
            print(user.username)

asyncio.run(main())

Bulk Fan-out Example

Use fanout to run a coroutine over every item in a stream with bounded concurrency:

import asyncio
import glabflow
from glabflow import fanout

async def get_mr_count(gl: glabflow.Client, user: glabflow.User) -> dict:
    count = 0
    async for _ in gl.mrs.stream_for_user(user.id, state="merged"):
        count += 1
    return {"user": user.username, "merged_mrs": count}

async def main():
    async with glabflow.Client(
        "https://gitlab.example.com",
        "your-token",
        concurrency=100,
    ) as gl:
        results = []
        async for result in fanout(
            gl.users.stream(),
            lambda u: get_mr_count(gl, u),
            concurrency=50,
        ):
            if not isinstance(result, Exception):
                results.append(result)

    print(f"Processed {len(results)} users")

asyncio.run(main())

API Coverage

✅ 100% Read-Only API Coverage!

glabflow covers all 173 read-only GitLab API v4 endpoints across 28 API categories, including Users, Projects, Groups, Merge Requests, Issues, Pipelines, CI/CD, Security, and more.

Note: glabflow focuses on read/bulk operations. For CRUD (create/update/delete), use python-gitlab alongside glabflow.

See REST API Guide for complete endpoint list.

Error Handling

import glabflow

async with glabflow.Client("https://gitlab.example.com", token) as gl:
    try:
        user = await gl.users.get(999999)
    except glabflow.NotFoundError:
        print("User not found")
    except glabflow.RateLimitError:
        print("Rate limited — reduce concurrency")
    except glabflow.TransientError:
        print("Transient error (409 Resource Lock or 52x) — auto-retried")

See Error Handling Documentation for details.

Advanced Configuration

Environment Variables

# Reads GITLAB_URL and GITLAB_TOKEN from environment
gl = glabflow.Client.from_env()

# Or with custom variable names
gl = glabflow.Client.from_env(url_env="MY_GITLAB_URL", token_env="MY_TOKEN")

Custom User-Agent

async with glabflow.Client(url, token, user_agent="my-tool/1.0") as gl:
    ...

Sudo / Impersonation

# Act on behalf of another user (admin only)
async with glabflow.Client(url, token, sudo="target-username") as gl:
    ...

Automatic Retry

REST calls automatically retry transient errors (409 Resource Lock, 500/502/503/504, Cloudflare 52x) with exponential backoff — no configuration needed.

Requirements

  • Python 3.14+ (free-threaded / no-GIL recommended)
  • Dependencies: aiohttp>=3.10, msgspec>=0.18, stamina>=24.2

CLI (Command-Line Interface)

glabflow includes a high-performance CLI for bulk GitLab repository operations — a drop-in replacement for gitlabber with 5-10x better performance.

Installation

# With uv (recommended)
uv tool install glabflow[cli]

# With pip
pip install glabflow[cli]

Quick Start

# Clone all repos from a group
glabflow clone -t TOKEN -u URL -g mygroup ./clones

# Clone with filtering
glabflow clone -t TOKEN -g group --include "*/backend/*" --exclude "*/archive/*" ./clones

# List repos without cloning
glabflow list -t TOKEN -g group --format tree

# Pull changes in all cloned repos
glabflow pull -t TOKEN ./clones

Commands

Command Description
clone Clone repositories from a GitLab group
pull Pull changes in all cloned repositories
fetch Fetch changes (no merge) in all cloned repositories
list List repositories in text, JSON, or tree format

Performance Comparison

Operation gitlabber glabflow CLI Speedup
API Discovery (1K projects) ~21s ~1.5s 14x
Clone (100 repos) ~150s ~45s 3.3x
Full workflow (500 repos) ~96s ~16-21s 4.6-6x

CLI Options

clone command:

glabflow clone -t TOKEN -u URL -g group ./destination \
    --concurrency 10 \        # Git operations concurrency
    --api-concurrency 100 \   # API requests concurrency
    --depth 1 \               # Shallow clone depth
    --submodules \            # Clone submodules
    --no-namespace \          # Flat directory structure
    --include "*/backend/*" \ # Include patterns
    --exclude "*/archive/*"   # Exclude patterns

list command:

glabflow list -t TOKEN -g group --format json   # JSON output
glabflow list -t TOKEN -g group --format tree   # Tree output
glabflow list -t TOKEN -g group                 # Text output (default)

Environment Variables

Variable Description
GITLAB_TOKEN GitLab private token
GITLAB_URL GitLab instance URL (default: https://gitlab.com)

Migration from gitlabber

The CLI is a drop-in replacement for gitlabber with the same options:

gitlabber glabflow CLI Notes
-t TOKEN -t TOKEN Same
-u URL -u URL Same
-g GROUP -g GROUP Same
-i PATTERN --include PATTERN Similar
-e PATTERN --exclude PATTERN Similar
-p tree --format tree Similar

See CLI Benchmarks for detailed performance comparisons.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

glabflow-0.1.0a3.tar.gz (140.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

glabflow-0.1.0a3-py3-none-any.whl (253.3 kB view details)

Uploaded Python 3

File details

Details for the file glabflow-0.1.0a3.tar.gz.

File metadata

  • Download URL: glabflow-0.1.0a3.tar.gz
  • Upload date:
  • Size: 140.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for glabflow-0.1.0a3.tar.gz
Algorithm Hash digest
SHA256 6b67eaf19306e034738231f61dae96b60170023a1641b3542bf5c6cdabb0ca6b
MD5 f37f5a353eb0d15a41c846b6ef316ab9
BLAKE2b-256 b2493091751b923b79a5ffa954418386d9b8fe406540f6c5660506c691a97251

See more details on using hashes here.

File details

Details for the file glabflow-0.1.0a3-py3-none-any.whl.

File metadata

  • Download URL: glabflow-0.1.0a3-py3-none-any.whl
  • Upload date:
  • Size: 253.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for glabflow-0.1.0a3-py3-none-any.whl
Algorithm Hash digest
SHA256 82f2edd300fa92166bb9dee9ca7c1e8bd1a6a229bbccd3837a4496f3d1b143eb
MD5 acd057c3986413c61a122c2836d9d857
BLAKE2b-256 3d46f499447120b3594537fba0f6efd88c3ae09f8c6abafcf98dac07b7a4a015

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page