Skip to main content

Calculate metrics and statistics for source-code repositories.

Project description

Repo Statistics

Calculate collaboration, code, and social metrics and statistics for a source-code repository.

Usage

Single Repository Processing

import json

from repo_statistics import analyze_repository

# Repo Path can be a local path or remote
repo_metrics = analyze_repository(
    repo_path="https://github.com/bioio-devs/bioio",
)

with open("example-repo-metrics.json", "w") as f:
    json.dump(repo_metrics, f, indent=4)

# It is recommended to provide a GitHub API token
# unless you disable "platform" metrics
repo_metrics = analyze_repository(
    repo_path="https://github.com/bioio-devs/bioio",
    # Provide a token
    # github_token="ABC",
    # Or disable platform metrics gathering
    compute_platform_metrics=False,
)

# Nearly every portion of metrics can be disable independent from one another
repo_metrics = analyze_repository(
    repo_path="https://github.com/bioio-devs/bioio",
    compute_timeseries_metrics=True,
    compute_contributor_stability_metrics=False,
    compute_contributor_absence_factor=True,
    compute_contributor_distribution_metrics=False,
    compute_repo_linter_metrics=False,
    compute_tag_metrics=True,
    compute_platform_metrics=False,
)

# By default, all time-periods are considered
# However, you can provide also provide a "start_datetime" and/or "end_datetime"
repo_metrics = analyze_repository(
    repo_path="https://github.com/bioio-devs/bioio",
    start_datetime="2025-01-01",
    end_datetime="2026-01-01",
    compute_platform_metrics=False,
)

# We also ignore bot changes by default by looking for
# dependabot / github / [bot] account naming in commit information
# This can be disabled, or, changed as well
repo_metrics = analyze_repository(
    repo_path="https://github.com/bioio-devs/bioio",
    # Keep all bots by ignoring name checks
    bot_names=None,
    # Keep all bots by ignoring email checks
    bot_email_indicators=None,
    compute_platform_metrics=False,
)

Multiple Repository Processing

from repo_statistics import analyze_repositories, DEFAULT_COILED_KWARGS

analyze_repos_results = analyze_repositories(
    repo_paths=[
        "https://github.com/bioio-devs/bioio",
        "https://github.com/bioio-devs/bioio-ome-zarr",
        "https://github.com/evamaxfield/aws-grobid",
        "https://github.com/evamaxfield/rs-graph",
        "https://github.com/evamaxfield/repo-statistics",
    ],

    # Has built in batching and caching to avoid re-processing repositories
    cache_results_path="repo-metrics-results.parquet",
    cache_errors_path="repo-metrics-errors.parquet",
    batch_size=4,
    # Or as a proportion of the total number of repositories
    # batch_size=0.1,
    # By default, we will use cached results before re-processing
    # This will drop repositories already in the cache and only process new ones
    # To re-process all repositories
    # ignore_cached_results=True,

    # Provide multiple tokens as strings in a list
    # github_tokens=["ghp_exampletoken1", "ghp_exampletoken2"],
    # Or can provide a gh-tokens file path
    # github_tokens=".github-tokens.yml",

    # By default, will process repositories one at a time
    # Can enable multithreading with the following options
    use_multithreading=True,
    n_threads=4,
    # Or, can use Coiled for distributed processing
    # use_coiled=True,
    # coiled_kwargs=DEFAULT_COILED_KWARGS,
    
    # All other keyword arguments are passed to analyze_repository
    # For example, to skip computing repo linter metrics
    # compute_repo_linter_metrics=False,
)

# Provides back an object with results and errors DataFrames
analyze_repos_results.metrics_df
analyze_repos_results.errors_df

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

repo_statistics-0.2.3.tar.gz (68.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

repo_statistics-0.2.3-py3-none-any.whl (70.2 kB view details)

Uploaded Python 3

File details

Details for the file repo_statistics-0.2.3.tar.gz.

File metadata

  • Download URL: repo_statistics-0.2.3.tar.gz
  • Upload date:
  • Size: 68.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for repo_statistics-0.2.3.tar.gz
Algorithm Hash digest
SHA256 8f4ed2cdf561e32552eec4855eace45f3e50f00509e16f7df75fb4d732bfcac7
MD5 0657d366cab95a2a2a3bee7c898b4229
BLAKE2b-256 256b3f5afa9a61854e3cbda14670be48428765e2d48b4a935c11144a9d5124ff

See more details on using hashes here.

Provenance

The following attestation bundles were made for repo_statistics-0.2.3.tar.gz:

Publisher: ci.yml on evamaxfield/repo-statistics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file repo_statistics-0.2.3-py3-none-any.whl.

File metadata

File hashes

Hashes for repo_statistics-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 6aa3c5e24a4e2676e631eb31e4c9ddd8a5d853f2b1e4f62c80b697bef02e4914
MD5 faf2662a85d359a7305fd4f48285ac3f
BLAKE2b-256 4fcf6e0a004d55413ab8eed56d006d021e3d917108ff6dc40247517118f3be3e

See more details on using hashes here.

Provenance

The following attestation bundles were made for repo_statistics-0.2.3-py3-none-any.whl:

Publisher: ci.yml on evamaxfield/repo-statistics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page