Open Source Development Team Operation Analytics

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

chrisgeorge

These details have not been verified by PyPI

Project description

dev-health-ops

Development team and developers' operational help should be available for all.

This project's goal is to provide tools and quick-win implementations by integrating with a majority of popular tooling.

Why this exists

Developer health tooling drifted into expensive, opaque “scoring” systems that are easy to misuse. This project is intentionally different.

Principles

Accessibility over extraction: derived from data teams already own; should be cheap to run and never gated behind per-seat pricing.
Learning, not judgment: metrics are signals about system behavior (WIP, churn, cycle time, blocked work), not performance rankings.
Trends > absolutes: compare change over time and distributions, not “who’s best”.
Inspectable by default: open schemas, explicit definitions, and reproducible computation.

Non-goals

Individual leaderboards and “scores”
HR/performance-management tooling
Executive theater dashboards that hide context

Installation

If you are not developing on this project and just want to use the tools, you can install the package directly:

pip install dev-health-ops

This provides the dev-hops command in your terminal.

dev-hops --help

Note: In the documentation below, you can replace python cli.py with dev-hops if you have installed the package.

Private Repository Support ✅

Both GitHub and GitLab connectors fully support private repositories! When provided with tokens that have appropriate permissions, you can access and sync data from private repositories just as easily as public ones.

GitHub: Requires repo scope on your personal access token
GitLab: Requires read_api and read_repository scopes on your private token

See PRIVATE_REPO_TESTING.md for detailed instructions on setting up and testing private repository access, or VERIFICATION_SUMMARY.md for a comprehensive overview.

Batch Repository Processing ✅

The GitHub connector supports batch processing of repositories with:

Pattern matching - Filter repositories using fnmatch-style patterns (e.g., chrisgeo/*, */api-*)
Configurable batch size - Process repositories in batches to manage memory and API usage
Rate limiting - Delay between batches plus shared backoff across workers (avoids stampedes; honors server reset/Retry-After when available)
Async processing - Process multiple repositories concurrently for better performance
Callbacks - Get notified as each repository is processed

Example Usage

from connectors import GitHubConnector

connector = GitHubConnector(token="your_token")

# List repos with pattern matching (integrated into list_repositories)
repos = connector.list_repositories(
    org_name="myorg",
    pattern="myorg/api-*",      # Filter repos matching this pattern
    max_repos=50,
)

# Get all repos matching a pattern with stats
results = connector.get_repos_with_stats(
    org_name="myorg",
    pattern="myorg/api-*",      # Filter repos matching this pattern
    batch_size=10,              # Process 10 repos at a time
    max_concurrent=4,           # Use 4 concurrent workers
    rate_limit_delay=1.0,       # Wait 1 second between batches
    max_commits_per_repo=100,   # Limit commits analyzed per repo
    max_repos=50,               # Maximum repos to process
)

for result in results:
    if result.success:
        print(f"{result.repository.full_name}: {result.stats.total_commits} commits")

Async Processing

For even better performance, use the async version:

import asyncio
from connectors import GitHubConnector

async def main():
    connector = GitHubConnector(token="your_token")
    
    results = await connector.get_repos_with_stats_async(
        org_name="myorg",
        pattern="myorg/*",
        batch_size=10,
        max_concurrent=4,
    )
    
    for result in results:
        if result.success:
            print(f"{result.repository.full_name}: {result.stats.total_commits} commits")

asyncio.run(main())

Pattern Matching Examples

Pattern	Matches
`chrisgeo/m*`	`chrisgeo/dev-health-ops`, `chrisgeo/my-app`
`/api-`	`anyorg/api-service`, `myuser/api-gateway`
`org/repo`	Exactly `org/repo`
`chrisgeo/*`	All repositories owned by `chrisgeo`
`sync`	Any repository with `sync` in the name

Developer Health Metrics (Work + Git) + Grafana ✅

This repo can compute daily “developer health” metrics and provision Grafana dashboards on top of:

Git + PR/MR facts (from GitHub/GitLab/local syncs)
Work tracking items (Jira issues, GitHub issues/Projects, GitLab issues)

Jira is not a replacement for pull request data — it’s used to track associated project work (throughput, WIP, work-item cycle/lead times). PR metrics still come from the Git provider data (e.g., GitHub PRs / GitLab MRs) synced by the CLI (python cli.py sync <target> --provider ...).

Docs

Metrics definitions + tables: docs/metrics.md
Implementation plans, metrics inventory, requirements/roadmap: docs/project.md, docs/metrics-inventory.md, docs/roadmap.md
Task tracker configuration (Jira/GitHub/GitLab, status mapping, teams): docs/task_trackers.md
Grafana dashboards + provisioning: docs/grafana.md

Quickstart (ClickHouse + Grafana)

Start ClickHouse + Grafana:

python cli.py grafana up

Sync Git data into ClickHouse (choose one):

# Local repo (commits + stats)
python cli.py sync git --provider local --db "clickhouse://localhost:8123/default" --repo-path .

# GitHub repo (commits + stats)
python cli.py sync git --provider github --db "clickhouse://localhost:8123/default" --owner <owner> --repo <repo>

# GitLab project (commits + stats)
python cli.py sync git --provider gitlab --db "clickhouse://localhost:8123/default" --project-id <id>

Compute derived metrics (Git + Work Items):

# (Optional) Sync work items from provider APIs (recommended)
python cli.py sync work-items --provider all --date 2025-02-01 --backfill 30 --db "clickhouse://localhost:8123/default"

# One day (derived Git metrics; enriches IC metrics from already-synced work items when available)
python cli.py metrics daily --date 2025-02-01 --db "clickhouse://localhost:8123/default"

# Backfill last 30 days ending at date
python cli.py metrics daily --date 2025-02-01 --backfill 30 --db "clickhouse://localhost:8123/default"

Open Grafana:

http://localhost:3000 (default admin / admin)
Dashboards are provisioned under the “Developer Health” folder.

“Download” work tracking data (Jira/GitHub/GitLab)

Work items are fetched from provider APIs via a dedicated sync command. This is separate from PR ingestion:

Configure credentials + mapping (see docs/task_trackers.md)
Sync work items: python cli.py sync work-items --provider jira|github|gitlab|all ... (use -s to filter repos; --auth for GitHub/GitLab token override)
metrics daily does not need --provider unless you want backward-compatible "sync-then-compute" behavior in one step.

cli.py automatically loads a local .env file from the repo root (without overriding already-set environment variables). Disable with DISABLE_DOTENV=1.

Sync Teams

You can sync team definitions into the database from multiple sources. This allows dashboards to group data by teams.

# Sync from a local YAML config (default)
python cli.py sync teams --db "sqlite+aiosqlite:///mergestat.db" --path config/teams.yaml

# Sync from Jira Projects (uses JIRA_* env vars)
python cli.py sync teams --db "sqlite+aiosqlite:///mergestat.db" --provider jira

# Generate synthetic teams for testing
python cli.py sync teams --db "sqlite+aiosqlite:///mergestat.db" --provider synthetic

Database Configuration

This project supports PostgreSQL, MongoDB, SQLite, and ClickHouse as storage backends.

Environment Variables

DB_CONN_STRING / DATABASE_URL (optional): Default DB URI for python cli.py metrics daily --db ... and for Alembic migrations.
DB_ECHO (optional): Enable SQL query logging for PostgreSQL and SQLite. Set to true, 1, or yes (case-insensitive) to enable. Any other value (including false, 0, no, or unset) disables it. Default: false. Note: Enabling this in production can expose sensitive data and impact performance.
MONGO_DB_NAME (optional): The name of the MongoDB database to use. If not specified, the script will use the database specified in the connection string, or default to mergestat.
REPO_UUID (optional): UUID for the repository. If not provided, a deterministic UUID will be derived from the git repository's remote URL (or repository path if no remote exists). This ensures the same repository always gets the same UUID across runs.
MAX_WORKERS (optional): Number of parallel workers for processing git blame data. Higher values can speed up processing but use more CPU and memory. Default: 4
LOG_LEVEL (optional): Logging level (e.g. INFO, DEBUG). Default: INFO
DISABLE_DOTENV (optional): Set to 1 to disable .env loading from the repo root.
GITHUB_TOKEN (optional): Default GitHub token when --auth is not provided.
GITLAB_TOKEN (optional): Default GitLab token when --auth is not provided.
GITLAB_URL (optional): Default GitLab base URL when --gitlab-url is not provided (default: https://gitlab.com).

Command-Line Arguments

You can also configure the database using command-line arguments, which will override environment variables:

Core Arguments

--db: Database connection string (required for sync; optional for metrics daily if DB_CONN_STRING/DATABASE_URL is set)
--db-type: Database backend override (postgres, mongo, sqlite, or clickhouse) - optional if URL scheme is clear
--provider: Source provider for sync targets (local, github, gitlab, synthetic)
--auth: Authentication token (GitHub/GitLab)
--repo-path: Path to the git repository (for --provider local)
--since: Lower-bound date/time filter for sync targets. Uses ISO formats (e.g., 2024-01-01 or 2024-01-01T00:00:00).

Connector-Specific Arguments

--owner: GitHub repository owner/organization
--repo: GitHub repository name
--gitlab-url: GitLab instance URL (default: https://gitlab.com)
--project-id: GitLab project ID (numeric)

Batch Processing Options

These unified options work with both GitHub and GitLab connectors:

-s, --search: fnmatch-style pattern to filter repositories/projects (e.g., owner/repo*, group/p*)
--batch-size: Number of repositories/projects to process in each batch (default: 10)
--group: Organization/group name to fetch repositories/projects from
--max-concurrent: Maximum concurrent workers for batch processing (default: 4)
--rate-limit-delay: Delay in seconds between batches for rate limiting (default: 1.0)
--max-commits-per-repo: Maximum commits to analyze per repository/project
--max-repos: Maximum number of repositories/projects to process
--use-async: Use async processing for better performance

Example usage:

# Using PostgreSQL (auto-detected from URL)
python cli.py sync git --provider local --db "postgresql+asyncpg://user:pass@localhost:5432/mergestat"

# Using MongoDB (auto-detected from URL)
python cli.py sync git --provider local --db "mongodb://localhost:27017"

# Local repo filtered to recent activity
python cli.py sync git --provider local \
  --db "sqlite+aiosqlite:///mergestat.db" \
  --repo-path /path/to/repo \
  --since 2024-01-01
# Commits and stats are limited to changes on/after this date.

# Using SQLite (file-based, auto-detected)
python cli.py sync git --provider local --db "sqlite+aiosqlite:///mergestat.db"

# Using SQLite (in-memory)
python cli.py sync git --provider local --db "sqlite+aiosqlite:///:memory:"

# GitHub repository with unified auth
python cli.py sync git --provider github \
  --db "postgresql+asyncpg://user:pass@localhost:5432/mergestat" \
  --auth "$GITHUB_TOKEN" \
  --owner torvalds \
  --repo linux

# GitLab project with unified auth
python cli.py sync git --provider gitlab \
  --db "mongodb://localhost:27017" \
  --auth "$GITLAB_TOKEN" \
  --project-id 278964

# Batch process repositories matching a pattern (GitHub)
python cli.py sync git --provider github \
  --db "sqlite+aiosqlite:///mergestat.db" \
  --auth "$GITHUB_TOKEN" \
  -s "chrisgeo/dev-health-*" \
  --group "chrisgeo" \
  --batch-size 5 \
  --max-concurrent 2 \
  --max-repos 10 \
  --use-async

# Batch process projects matching a pattern (GitLab)
python cli.py sync git --provider gitlab \
  --db "sqlite+aiosqlite:///mergestat.db" \
  --auth "$GITLAB_TOKEN" \
  --gitlab-url "https://gitlab.com" \
  --group "mygroup" \
  -s "mygroup/api-*" \
  --batch-size 5 \
  --max-concurrent 2 \
  --max-repos 10 \
  --use-async

MongoDB Connection String Format

MongoDB connection strings follow the standard MongoDB URI format:

Basic: mongodb://host:port
With authentication: mongodb://username:password@host:port
With database: mongodb://username:password@host:port/database_name
With options: mongodb://host:port/?authSource=admin&retryWrites=true

You can also set the database name separately using the MONGO_DB_NAME environment variable instead of including it in the connection string.

SQLite Connection String Format

SQLite connection strings use the following format:

File-based: sqlite+aiosqlite:///path/to/database.db (relative path) or sqlite+aiosqlite:////absolute/path/to/database.db (absolute path - note the four slashes)
In-memory: sqlite+aiosqlite:///:memory: (data is lost when the process exits)

SQLite is ideal for:

Local development and testing
Single-user scenarios
Small to medium-sized repositories
Environments where running a database server is not practical

Note: SQLite does not use connection pooling since it is a file-based database.

Performance Tuning

The script includes several configuration options to optimize performance:

MAX_WORKERS: Controls parallel processing of git blame data. Set this based on your CPU cores (e.g., 2-8). Higher values speed up processing but use more CPU and memory.
Connection Pooling: PostgreSQL automatically uses connection pooling with these defaults:
- Pool size: 20 connections
- Max overflow: 30 additional connections
- Connections are recycled every hour

Example for large repositories:

export MAX_WORKERS=8
python cli.py sync git --provider local --db "sqlite+aiosqlite:///mergestat.db" --repo-path .

Example for resource-constrained environments:

export MAX_WORKERS=2
python cli.py sync git --provider local --db "sqlite+aiosqlite:///mergestat.db" --repo-path .

Performance Optimizations

This project includes several key performance optimizations to speed up git data processing:

1. Increased Batch Size (10x improvement)

Batching: Uses batched inserts to reduce database round-trips
Impact: Significantly reduces database round-trips, improving insertion speed

2. Parallel Git Blame Processing (4-8x improvement)

Implementation: Uses asyncio with configurable worker pool
Default: 4 parallel workers processing files concurrently
Impact: Multi-core CPU utilization, dramatically faster blame processing
Configuration: Set MAX_WORKERS=8 for more powerful machines

3. Database Connection Pooling (PostgreSQL)

Pool size: 20 connections (up from default 5)
Max overflow: 30 additional connections (up from default 10)
Impact: Better handling of concurrent operations, reduced connection overhead
Auto-configured: No manual setup required

4. Optimized Bulk Operations

All database insertions use bulk operations
MongoDB operations use ordered=False for better performance
SQLAlchemy uses add_all() for efficient batch inserts

5. Smart File Filtering

Skips binary files (images, videos, archives, etc.)
Skips files larger than 1MB for content reading
Reduces unnecessary I/O and processing time

Expected Performance Improvements

For a typical repository with 1000 files and 10,000 commits:

Operation	Before	After	Improvement
Git Blame	50 min	6-12 min	4-8x faster
Commits	-	1-2 min	New feature
Commit Stats	-	2-4 min	New feature
Files	-	30-60 sec	New feature
Total	50+ min	10-20 min	~3-5x faster

Actual performance depends on hardware, repository size, and configuration.

PostgreSQL vs MongoDB vs SQLite: Setup and Migration Considerations

Using PostgreSQL

Requires running database migrations with Alembic before first use
Provides strong relational data structure
Best for complex queries and joins

Example setup:

# Start PostgreSQL with Docker Compose
docker compose up postgres -d

# Run migrations (Alembic reads DB_CONN_STRING)
export DB_CONN_STRING="postgresql+asyncpg://postgres:postgres@localhost:5333/postgres"
alembic upgrade head

# Sync a local repo
python cli.py sync git --provider local --db "$DB_CONN_STRING" --repo-path .

Using MongoDB

No migrations required - collections are created automatically
Schema-less design allows for flexible data structures
Best for quick setup and document-based storage

Example setup:

# Start MongoDB with Docker Compose
docker compose up mongo -d

export MONGO_DB_NAME="mergestat" # optional if not in URI
python cli.py sync git --provider local --db "mongodb://localhost:27017" --repo-path .

Using SQLite

No migrations required - tables are created automatically using SQLAlchemy
Simple file-based or in-memory database
No external database server required
Best for local development, testing, and single-user scenarios

Example setup:

python cli.py sync git --provider local --db "sqlite+aiosqlite:///mergestat.db" --repo-path .

Or for in-memory database (data lost when process exits):

python cli.py sync git --provider local --db "sqlite+aiosqlite:///:memory:" --repo-path .

Using ClickHouse

No migrations required - tables are created automatically using ReplacingMergeTree
Best for analytics and large datasets

Example setup:

python cli.py sync git --provider local --db "clickhouse://default:@localhost:8123/default" --repo-path .

Switching Between Databases

The different backends use different storage mechanisms and are not directly compatible
Data is not automatically migrated when switching between PostgreSQL, MongoDB, SQLite, and ClickHouse
If you need to switch backends, you'll need to re-run the analysis to populate the new database
PostgreSQL and MongoDB can run simultaneously on the same machine using different ports (see compose.yml)

Local Repository Pull Request Handling Warning

Important: When processing local repositories, pull request records are inferred from merge commit messages and local refs. These inferences are estimation-based and highly volatile:

Dates (created_at, merged_at) may be inaccurate due to limited information in local repositories
PR states (open/closed/merged) are estimated from commit history
Some PRs may be missed entirely if they don't match expected patterns
The accuracy depends heavily on repository history and commit message conventions

This behavior is different from GitHub/GitLab connectors which provide accurate PR data directly from the provider API.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

chrisgeorge

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.0

Jan 31, 2026

0.4.0b3 pre-release

Jan 30, 2026

0.4.0b2 pre-release

Jan 28, 2026

0.4.0b0 pre-release

Jan 25, 2026

0.4.0a1 pre-release

Jan 25, 2026

0.4.0a0 pre-release

Jan 25, 2026

0.3.5

Jan 20, 2026

0.3.4

Jan 18, 2026

0.3.3

Jan 17, 2026

0.3.2

Jan 15, 2026

0.3.1

Jan 13, 2026

0.3.0

Jan 13, 2026

0.2.2

Jan 7, 2026

0.2.1

Jan 2, 2026

This version

0.1.2

Dec 25, 2025

0.1.1.1

Dec 24, 2025

0.1.1.1b0 pre-release

Dec 24, 2025

0.1.1

Dec 24, 2025

0.1.1b1 pre-release

Dec 25, 2025

0.1.1b0 pre-release

Dec 24, 2025

0.1.0

Dec 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dev_health_ops-0.1.2.tar.gz (543.8 kB view details)

Uploaded Dec 25, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dev_health_ops-0.1.2-py3-none-any.whl (202.6 kB view details)

Uploaded Dec 25, 2025 Python 3

File details

Details for the file dev_health_ops-0.1.2.tar.gz.

File metadata

Download URL: dev_health_ops-0.1.2.tar.gz
Upload date: Dec 25, 2025
Size: 543.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dev_health_ops-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`b9d0adbac836d90ba4b1cd0667179fb12939f7960adc91b81e1710f867a84b34`
MD5	`4ecf3597a8bee8a471dc1644f28bb10e`
BLAKE2b-256	`2dff5c9d4966512cd15624d798b9d98a5de3ef089e895cf61e518b933cf726f0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dev_health_ops-0.1.2.tar.gz:

Publisher: publish.yml on chrisgeo/dev-health-ops

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dev_health_ops-0.1.2.tar.gz
- Subject digest: b9d0adbac836d90ba4b1cd0667179fb12939f7960adc91b81e1710f867a84b34
- Sigstore transparency entry: 779267294
- Sigstore integration time: Dec 25, 2025
Source repository:
- Permalink: chrisgeo/dev-health-ops@e77394298a4a1380e683ea09171929988d774c0b
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/chrisgeo
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@e77394298a4a1380e683ea09171929988d774c0b
- Trigger Event: release

File details

Details for the file dev_health_ops-0.1.2-py3-none-any.whl.

File metadata

Download URL: dev_health_ops-0.1.2-py3-none-any.whl
Upload date: Dec 25, 2025
Size: 202.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dev_health_ops-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`52c36ffdf2d3e68c4315e1f0c7e45083b9cc3c8dbe5835060693025a11020feb`
MD5	`07a71df1b14c7a4948b6289b78c84c65`
BLAKE2b-256	`be80212ea673a30cf193d88051c49ddc42a4ae1099195c9ebf87db5120e1be0c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dev_health_ops-0.1.2-py3-none-any.whl:

Publisher: publish.yml on chrisgeo/dev-health-ops

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dev_health_ops-0.1.2-py3-none-any.whl
- Subject digest: 52c36ffdf2d3e68c4315e1f0c7e45083b9cc3c8dbe5835060693025a11020feb
- Sigstore transparency entry: 779267299
- Sigstore integration time: Dec 25, 2025
Source repository:
- Permalink: chrisgeo/dev-health-ops@e77394298a4a1380e683ea09171929988d774c0b
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/chrisgeo
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@e77394298a4a1380e683ea09171929988d774c0b
- Trigger Event: release

dev-health-ops 0.1.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

dev-health-ops

Why this exists

Principles

Non-goals

Installation

Private Repository Support ✅

Batch Repository Processing ✅

Example Usage

Async Processing

Pattern Matching Examples

Developer Health Metrics (Work + Git) + Grafana ✅

Quickstart (ClickHouse + Grafana)

“Download” work tracking data (Jira/GitHub/GitLab)

Sync Teams

Database Configuration

Environment Variables

Command-Line Arguments

Core Arguments

Connector-Specific Arguments

Batch Processing Options

MongoDB Connection String Format

SQLite Connection String Format

Performance Tuning

Performance Optimizations

1. Increased Batch Size (10x improvement)

2. Parallel Git Blame Processing (4-8x improvement)

3. Database Connection Pooling (PostgreSQL)

4. Optimized Bulk Operations

5. Smart File Filtering

Expected Performance Improvements

PostgreSQL vs MongoDB vs SQLite: Setup and Migration Considerations

Using PostgreSQL

Using MongoDB

Using SQLite

Using ClickHouse

Switching Between Databases

Local Repository Pull Request Handling Warning

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance