Skip to main content

Repository Participation Observer. A tool for investigation git repository contribution data and patterns.

Project description

RPO: Repository Participation Observer

Python application A command line tool and Python library to help you analyze and visualized Git repositories. Ever wondered who has most contributions? How participation has changed over time? What are the hotspots in your code that change frequently? Who has the highest bus factor? rpo can help.

NOTE: This is alpha software under active development. There will be breaking changes.

Usage

CLI

Usage: rpo [OPTIONS] COMMAND [ARGS]...

Options:
  -r, --repository PATH
  -b, --branch TEXT
  --allow-dirty              Proceed with analyis even if repository has
                             uncommitted changes
  File selection:            Give you control over which files should be
                             included in your analysis
    -g, --glob TEXT          File path glob patterns to INCLUDE. If specified,
                             matching paths will be the only files included in
                             aggregation.            If neither --glob nor
                             --xglob are specified, all files will be included
                             in aggregation. Paths are relative to root of
                             repository.
    -xg, --xglob TEXT        File path glob patterns to EXCLUDE. If specified,
                             matching paths will be filtered before
                             aggregation.            If neither --glob nor
                             --xglob are specified, all files will be included
                             in aggregation. Paths are relative to root of
                             repository.
    --exclude-generated      If set, exclude common generated files like
                             package-manager generated lock files from
                             analysis
  Data selection:            Control over how repository data is aggregated
                             and sorted
    -A, --aggregate-by TEXT  Controls the field used to aggregate data
    -I, --identify-by TEXT   Controls the field used to identify auhors.
    -S, --sort-by TEXT       Controls the field used to sort output
  Plot options:              Control plot output, if available
    -p, --plot PATH          The directory where plot output visualization
                             will live. Either a filename ending with '.png'
                             or a directory.
  Output options:            Control how data is displayed or saved
    --save-as FILE           Save the report data to the path provided; format
                             is determined by the filename extension,
                             which must be one of (.json|.csv). If no save-as
                             path is provided, the report will be printed to
                             stdout
  -c, --config FILE          The location of the json formatted config file to
                             use. Defaults to a hidden config.json file in the
                             current working directory. If it exists, then
                             options in the config file take precedence over
                             command line flags.
  --help                     Show this message and exit.

Commands:
  activity-report   Produces file or author report of activity at a...
  cumulative-blame  Computes the cumulative blame of the repository over...
  punchcard         Computes commits for a given user by datetime
  repo-blame        Computes the per user blame for all files at a given...
  revisions         List all revisions in the repository
  summary           Generate very high level summary for the repository

Library

pip install rpo
from rpo import RepositoryAnalyzer

ra = RepoAnalyser("./path/to_git_repo")

Examples

NOTE: depending on your shell, you may or may not need to escape the splat character in the glob patterns used below.

See test_cli.sh for more examples.

Git Blame for all Files in a Repo at a Given Revision, Identify Users by Email

$ rpo -r ../my-local-repo -R HEAD -I email repo-blame

Cumulative Git Blame for all Files in a Repo at a Given Revision, Identify Users by Name

$ rpo -r ../my-local-repo cumulative-blame

Author Activity Report, Including Only Files that Match a Pattern

$ rpo -r ../my-local-repo -g tests/\* activity-report

Author Activity Report, Excluding Files that Match a Pattern

$ rpo -r ../my-local-repo -xg tests/\* activity-report

File Activity Report, Excluding Files that Match a Pattern

$ rpo -r ../my-local-repo -xg tests/\* activity-report -t files

Features

  • Automatically generate aliases that refer to the same person
  • Support analyzing by glob
  • Support excluding by glob
  • Produce blame charts
  • Optionally ignore merge commits
  • Optionally ignore whitespace
  • Identify major refactorings
  • Fast execution, even on giant repositories

Performance

The goal is for the library to work even on the largest libraries. In general, the performance is proportional to the number of authors, commits, and files being considered in the aggregations.

The authors regularly test using the cpython repository, which contains over 1,000,000 objects. That takes a while.

TODO: Performance graphs

Similar Projects and Inspiration

References

Git Commands

These are useful for validating results reported here. The git man pages for various commands is helpful reading.

All the files edited in a revision

git diff-tree --no-commit-id --name-only HEAD~1 -r

All the files present at a particular revision

git ls-tree -rlt HEAD

All commits reachable from a revision

git rev-list HEAD --count

Count all commits reachable from a revision

git rev-list HEAD --count

All commits that touch a particular object (tree in this case)

git rev-list HEAD img

All files at each commit

git rev-list HEAD | xargs -r -I % git ls-tree -rt --name-only %
git cat-file --batch-all-objects --batch-check --unordered
git rev-list --all --objects --filter=object:type=blob HEAD | git cat-file --batch-check="%(objectname) %(objecttype) %(rest) %(deltabase)"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rpo-0.1.0a10.tar.gz (17.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rpo-0.1.0a10-py3-none-any.whl (17.3 kB view details)

Uploaded Python 3

File details

Details for the file rpo-0.1.0a10.tar.gz.

File metadata

  • Download URL: rpo-0.1.0a10.tar.gz
  • Upload date:
  • Size: 17.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for rpo-0.1.0a10.tar.gz
Algorithm Hash digest
SHA256 a7152f48f3401c0a6dba38643a6f7363ddaef2f365029274590df05089597017
MD5 a74289065b482fc8b09f9da0a2609ef5
BLAKE2b-256 22a32f2c65eecd8b310c6131fb86922e50b81c445c83bca0afa06a780cdc1a3b

See more details on using hashes here.

Provenance

The following attestation bundles were made for rpo-0.1.0a10.tar.gz:

Publisher: python-publish.yml on crlane/rpo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rpo-0.1.0a10-py3-none-any.whl.

File metadata

  • Download URL: rpo-0.1.0a10-py3-none-any.whl
  • Upload date:
  • Size: 17.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for rpo-0.1.0a10-py3-none-any.whl
Algorithm Hash digest
SHA256 642bd284b888d574e89b795de7398fcefd851c0efbe68ee8b7fe4148ff4b4a69
MD5 6b780182f4fff9a940c3affaf292246c
BLAKE2b-256 4bc062615f049fdb356c87bb1ce18da01e2090e7a56fda064f8f47310416fc0f

See more details on using hashes here.

Provenance

The following attestation bundles were made for rpo-0.1.0a10-py3-none-any.whl:

Publisher: python-publish.yml on crlane/rpo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page