repotracer

A tool to collect statistics about a git repository over time

These details have not been verified by PyPI

Project description

📈 Repotracer: Watch your code changing over time

Repotracer gives you insight into the change going on in your codebase.

It will loop through every day in the history of a git repository, and collect any stat you might ask it to. Eg:

Typescript migration: count the number of lines of JS vs TS
Count number of deprecated function calls
Measure adoption of new APIs

It compiles the results for each day into a csv, and also immediately gives you a plot of the data in csv. It supports incremental computation: re-run it every so often without starting from scratch.

Use it to watch:

Percent of commits that touched at least one test, and count of authors writing tests
Count number of authors who have used a new library

These are only the beginning. You can write your own stats and plug them into repotracer. If you can write a script to calculate a property of your code, then repotracer can graph it for you over time. For example you could run your build toolchain and counting numbers of a particular warning, or use a special tool.

Repotracer aims to be a swiss army knife to run analytics queries on your source code.

Installation

Install with pip install repotracer.

To run the regex_count,file_count and loc_count stats, you'll need to have ripgrep, fd and tokei installed, respectively. On Macos you can install these with:

brew install ripgrep fd tokei

Repotracer will look a config file either in $PWD/.repotracer/config.json or $HOME/.repotracer/config.json. If neither exists, it will create one in the latter location.

repotracer add-stat

add-stat will guide you through the process of configuring a repo, and adding a new stat.

Usage

A collection of commands for onboarding will come soon. In the meantime:

repotracer run reponame statname will compute a single stat. The data will show up in ./stats/repo/stat_name.csv, and a plot will be written to ./stats/repo/stat_name.png.
repotracer run reponame will run all the stats for that repo. For now this makes separate passes for each commit, later it might do several stats for the same commit at a time.
repotracer run will update all stats in the config file.

Stat types

More documentation about the configuration options will come soon.

regex_count runs ripgrep and sums the number of matches in the whole repo. Additional args can be passed to ripgrep by adding rg_args in the params object.
file_count runs fd and counts the number of files found.
loc_count runs tokei and counts the number of lines of code per language.
The next stat will be script, which will run any bash script the user will want, to allow for max customization.

Stat options

The config format is JSON5, but currently comments are lost when the command updates the config file. I'm planning on moving to TOML to fix that, because the python TOML library supports it.

"repos" : {
    "svelte": {
      "url": "", // the url to clone the repo from
      "stats": {
        "count-ts-ignore": { // the name of the stat. Will be used in filenames
          "description": "The number of ts-ignores in the repo.", //Optional. A short description of the stat.
          "type": "regex_count", //
          "start": "2020-01-01", // Optional. When to start the the collection for this stat. If not specified, will use the beginning of the repo
          "path_in_repo": "2020-01-01", // Optional. Will cd into this path to run the stat
          "params": { // any parameters that depend on the measurement type
            "pattern": "ts-ignore",  //The pattern to pass to rigpgrep
            "ripgrep_args": "-g '!**/tests/*'" // any extra arguments to pass to ripgrep
          }
        },
      }
    }
}
``

features

Stats: Regex count
Stats: File count
Stats: LOC count (broken down by language)
Stats: Custom Script
Stats: option to measure at monthly cadence instead of daily
Stats: Turn Betterer count files into stats
Runner: Incremental runs
Runner: Interleaved runs, only stream through repo once when collecting multiple stats on repo
Runner: Parallel execution by running on many copies of the repo
Runner: when path_in_repo is set, only git checkout that portion of the the fs
Fix logging to not be so all or nothing
Deploy to pypi on MR merges

Design Goals

Repotracer is meant to achieve:

Reliably collecting stats, in a reasonable amount of time. The idea is that a nightly job in CI will be running stat collection, so as long as it takes < 30 mins to collect all stats that should be ok. However we don't want it to be dog slow, as the occasional "interactive" use or end-user running it directly will also be supported.
Flexibility for common use cases like counting regex matches, files, LOC. But not a huge config surface; if you want to tweak a command too much, just use a script type for your stat.
Out of the box simple graphing support, but not too much. I don't plan on adding many plotting options to the configs.
Nice starting DX. It should be easy to download the app, install it in a repo and get a nice plot of something you care about.

Dev Notes

This is a side project with no reliability guarantees. It also is optimized for the author's productivity/engagement. It uses heavyweight "dev-friendly" libraries, and doesn't focus too much on code cleanliness. The main priority is to get value now (the nice data/graphs), rather than build a timeless masterpiece.

That doesn't mean it's meant as a filthy mess. Here are the main concepts:

Repotracer manages a a collection of Stat objects. These are specified by a StatConfig, and they mostly define some params + the Measurement (ie the actual command to run, like tokei, ripgrep, a custom script, etc).

A Stat can run itself, and that will update the csv for that stat. From the POV a user, they care about running many Stats on a single repo, so we aggregate those into a RepoConfig. This mainly defines where to download the repo. There is the idea that the repo storage could be pluggable, eg if the repo is stored on NFS or something different.

The overall config object is GlobalConfig, and it composes a couple basic parameters plus a list of RepoConfigs.

In theory a bunch of things can be made pluggable, but we'll wait until we need to swap anything out to define the interfaces.

We use pandas to store & interface with the data, for ease of use. Pandas gives day-aggregation functions, and dataframe powers.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.4.3

Nov 2, 2023

0.4.2

Nov 2, 2023

0.4.1

Nov 2, 2023

0.4.0

Nov 2, 2023

0.3.1

Oct 31, 2023

0.3.0

Oct 29, 2023

0.2.5

Sep 21, 2023

0.2.4

Sep 3, 2023

0.2.3

Aug 31, 2023

0.2.2

Aug 31, 2023

0.2.1

Aug 31, 2023

0.2.0

Aug 31, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

repotracer-0.4.3.tar.gz (18.8 kB view details)

Uploaded Nov 2, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

repotracer-0.4.3-py3-none-any.whl (20.3 kB view details)

Uploaded Nov 2, 2023 Python 3

File details

Details for the file repotracer-0.4.3.tar.gz.

File metadata

Download URL: repotracer-0.4.3.tar.gz
Upload date: Nov 2, 2023
Size: 18.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.6.1 CPython/3.11.4 Darwin/22.3.0

File hashes

Hashes for repotracer-0.4.3.tar.gz
Algorithm	Hash digest
SHA256	`ff48af965d51def002a87bdf02fda9837825da21f1c96f95a402b2e9c60dc357`
MD5	`f2393f7e576985c9e8b95810b55bbcd3`
BLAKE2b-256	`0a4ce4e4e6e1af008813557cd1891e88749dc1678c640d255dcf72c1391648e3`

See more details on using hashes here.

File details

Details for the file repotracer-0.4.3-py3-none-any.whl.

File metadata

Download URL: repotracer-0.4.3-py3-none-any.whl
Upload date: Nov 2, 2023
Size: 20.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.6.1 CPython/3.11.4 Darwin/22.3.0

File hashes

Hashes for repotracer-0.4.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`87286c119d505306ae273a05f2477e7ac9dd927e3ba277cf266f2fbed36f8225`
MD5	`674e958a6e725f4864651109bfc997b8`
BLAKE2b-256	`b9f8db3fd3e709f6ad1a51ca7e76e98da3b6f0f688909331a0cc9c7c23a41a11`

See more details on using hashes here.

repotracer 0.4.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

📈 Repotracer: Watch your code changing over time

Installation

Usage

Stat types

Stat options

features

Design Goals

Dev Notes

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes