Calculate metrics and statistics for source-code repositories.
Project description
Repo Statistics
Calculate collaboration, code, and social metrics and statistics for a source-code repository.
Metrics Available
See full table for additional information
| Metric Family | Metrics | Description |
|---|---|---|
| Development Activity Pattern Metrics | commit_entropy, commit_variation, commit_frac, lines_changed_entropy, lines_changed_variation |
Metrics measuring regularity and consistency of development effort over time, calculated at both weekly and monthly intervals to identify sustained engagement patterns versus bursty development |
| Development Episode Characteristics | median_commit_span, mean_commit_span, std_commit_span, median_no_commit_span, mean_no_commit_span, std_no_commit_span |
Metrics describing the temporal structure of active and inactive development periods, characterizing sustained work episodes and dormancy gaps |
| Contributor Engagement Patterns | stable_contributors_count, transient_contributors_count, median_contribution_span_days, mean_contribution_span_days, normalized_median_span, normalized_mean_span |
Metrics characterizing contributor stability and engagement duration, distinguishing between sustained community members and episodic contributors |
| Contributor Distribution Metrics | unique_contributors_count, contributor_absence_factor_code, contributor_absence_factor_all, contributor_specialization, specialists_contributor_count, generalists_contributor_count, contributor_change_count, contributor_same_count |
Metrics examining how development effort and knowledge are distributed among contributors, including bus factor analysis and specialist/generalist patterns |
| Repository Timeline Metrics | initial_commit_datetime, most_recent_commit_datetime, most_recent_substantial_commit_datetime, to_most_recent_commit_duration_days, to_most_recent_substantial_commit_duration_days |
Basic temporal metadata for development history analysis, tracking project age, activity status, and lifetime of meaningful development |
| Development Activity Volume | commits_count, non_bot_commits_count, coding_commits_count, source_lines_of_code, source_lines_of_comments |
Metrics quantifying overall development activity and effort, distinguishing between human and automated contributions and measuring codebase size |
| Community Engagement Metrics | stars_count, forks_count, watchers_count, open_issues_count |
Metrics reflecting community interest and participation through GitHub features, indicating broader impact and active engagement |
| Release Management Metrics | semver_tags_count, non_semver_tags_count, total_tags_count |
Metrics related to versioning and release practices, measuring adoption of formal release management conventions |
| Repository Classification Metadata | repo_primary_language, repo_classification, file_extensions_set |
Descriptive metadata for filtering and comparative analysis, characterizing project type and technical composition |
| Documentation and Best Practices | repo_linter_license_file_exists, repo_linter_readme_file_exists, repo_linter_readme_references_license, repo_linter_changelog_file_exists, repo_linter_contributing_file_exists, repo_linter_code_of_conduct_file_exists, repo_linter_code_of_conduct_file_contains_email, repo_linter_security_file_exists, repo_linter_support_file_exists, repo_linter_test_directory_exists, repo_linter_integrates_with_ci, repo_linter_github_issue_template_exists, repo_linter_github_pull_request_template_exists, repo_linter_binaries_not_present |
Binary indicators of documentation files and development practices supporting sustainability, including core documentation, community guidelines, and development infrastructure |
| Gini Coefficients (experimental) | commit_gini_coefficient, lines_changed_gini_coefficient, contributor_commit_gini, contributor_lines_gini, commit_size_gini, time_between_commits_gini |
Alternative inequality measures using Gini coefficients to complement existing sustainability indicators, measuring distribution equality across temporal and contributor dimensions |
| Commit Pattern Metrics | commit_size_entropy, commit_size_variation, time_between_commits_entropy, time_between_commits_variation |
Metrics analyzing commit sizing and timing patterns using entropy and variation measures to characterize development rhythm and consistency |
| Advanced Sustainability Indicators | documentation_to_code_ratio, contributor_retention_rate, releases_per_year, knowledge_concentration_risk, simple_code_churn_rate |
Higher-level metrics for comprehensive sustainability assessment combining multiple dimensions including documentation quality, contributor retention, release cadence, knowledge distribution, and code volatility |
Usage
Single Repository Processing
import json
from repo_statistics import analyze_repository
# Repo Path can be a local path or remote
repo_metrics = analyze_repository(
repo_path="https://github.com/bioio-devs/bioio",
)
with open("example-repo-metrics.json", "w") as f:
json.dump(repo_metrics, f, indent=4)
# It is recommended to provide a GitHub API token
# unless you disable "platform" metrics
repo_metrics = analyze_repository(
repo_path="https://github.com/bioio-devs/bioio",
# Provide a token
# github_token="ABC",
# Or disable platform metrics gathering
compute_platform_metrics=False,
)
# Nearly every portion of metrics can be disable independent from one another
repo_metrics = analyze_repository(
repo_path="https://github.com/bioio-devs/bioio",
compute_timeseries_metrics=True,
compute_contributor_stability_metrics=False,
compute_contributor_absence_factor=True,
compute_contributor_distribution_metrics=False,
compute_repo_linter_metrics=False,
compute_tag_metrics=True,
compute_platform_metrics=False,
)
# By default, all time-periods are considered
# However, you can provide also provide a "start_datetime" and/or "end_datetime"
# TODO: Temporarily disabled
# repo_metrics = analyze_repository(
# repo_path="https://github.com/bioio-devs/bioio",
# start_datetime="2025-01-01",
# end_datetime="2026-01-01",
# compute_platform_metrics=False,
# )
# We also ignore bot changes by default by looking for
# "[bot]" account naming in commit information
# This can be disabled, or, changed as well
repo_metrics = analyze_repository(
repo_path="https://github.com/bioio-devs/bioio",
# Keep all bots by ignoring name checks
bot_name_indicates=None,
# Keep all bots by ignoring email checks
bot_email_indicators=None,
compute_platform_metrics=False,
)
Multiple Repository Processing
from repo_statistics import analyze_repositories, DEFAULT_COILED_KWARGS
analyze_repos_results = analyze_repositories(
repo_paths=[
"https://github.com/bioio-devs/bioio",
"https://github.com/bioio-devs/bioio-ome-zarr",
"https://github.com/evamaxfield/aws-grobid",
"https://github.com/evamaxfield/rs-graph",
"https://github.com/evamaxfield/repo-statistics",
],
# Has built in batching and caching to avoid re-processing repositories
cache_results_path="repo-metrics-results.parquet",
cache_errors_path="repo-metrics-errors.parquet",
batch_size=4,
# Or as a proportion of the total number of repositories
# batch_size=0.1,
# By default, we will use cached results before re-processing
# This will drop repositories already in the cache and only process new ones
# To re-process all repositories
# ignore_cached_results=True,
# Provide multiple tokens as strings in a list
# github_tokens=["ghp_exampletoken1", "ghp_exampletoken2"],
# Or can provide a gh-tokens file path
# github_tokens=".github-tokens.yml",
# By default, will process repositories one at a time
# Can enable multithreading with the following options
use_multithreading=True,
n_threads=4,
# Or, can use Coiled for distributed processing
# use_coiled=True,
# coiled_kwargs=DEFAULT_COILED_KWARGS,
# All other keyword arguments are passed to analyze_repository
# For example, to skip computing repo linter metrics
# compute_repo_linter_metrics=False,
)
# Provides back an object with results and errors DataFrames
analyze_repos_results.metrics_df
analyze_repos_results.errors_df
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file repo_statistics-0.5.0.tar.gz.
File metadata
- Download URL: repo_statistics-0.5.0.tar.gz
- Upload date:
- Size: 85.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
afd605b813c7f472280ab5b46136c211142a70b33e676f065dc8f4c828f3deae
|
|
| MD5 |
7161d89d7cc9113d4c811bdeb5d21e97
|
|
| BLAKE2b-256 |
d4bb1277a3c307fced761b37d1b2b64066c9fe430724da2f6f1794adba529edf
|
Provenance
The following attestation bundles were made for repo_statistics-0.5.0.tar.gz:
Publisher:
ci.yml on evamaxfield/repo-statistics
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
repo_statistics-0.5.0.tar.gz -
Subject digest:
afd605b813c7f472280ab5b46136c211142a70b33e676f065dc8f4c828f3deae - Sigstore transparency entry: 1248344199
- Sigstore integration time:
-
Permalink:
evamaxfield/repo-statistics@e02e14606d104b7118a1faf411cfae1c18373480 -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/evamaxfield
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@e02e14606d104b7118a1faf411cfae1c18373480 -
Trigger Event:
push
-
Statement type:
File details
Details for the file repo_statistics-0.5.0-py3-none-any.whl.
File metadata
- Download URL: repo_statistics-0.5.0-py3-none-any.whl
- Upload date:
- Size: 79.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
04f8f3d614f288ab81e443df8db7b825ae54d7a1cbe98c34ee84dc0eb93d67f0
|
|
| MD5 |
65cee1c9fd7e32fe694c5f2699b8a08b
|
|
| BLAKE2b-256 |
834a01ce4a6fce2bb2c328ec231a8c2b5abad8970280e99f6ef44110aefd3136
|
Provenance
The following attestation bundles were made for repo_statistics-0.5.0-py3-none-any.whl:
Publisher:
ci.yml on evamaxfield/repo-statistics
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
repo_statistics-0.5.0-py3-none-any.whl -
Subject digest:
04f8f3d614f288ab81e443df8db7b825ae54d7a1cbe98c34ee84dc0eb93d67f0 - Sigstore transparency entry: 1248344302
- Sigstore integration time:
-
Permalink:
evamaxfield/repo-statistics@e02e14606d104b7118a1faf411cfae1c18373480 -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/evamaxfield
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@e02e14606d104b7118a1faf411cfae1c18373480 -
Trigger Event:
push
-
Statement type: