Skip to main content

A predictive codebase cartographer using Git history

Project description

Prehistorian

CI PyPI Python Downloads License Stars Issues Last Commit Repo Size Code Size

Prehistorian is a predictive codebase cartographer. It analyzes Git history to uncover hidden, undocumented behavioral dependencies between files and warns you when a change is likely missing a related update.

It is CLI-only, LLM-free, and CPU-only by design.


Why Prehistorian

Static imports and direct references show only explicit dependencies. Real-world teams rely on patterns that rarely show up in the import graph: files that consistently change together across many commits. Prehistorian surfaces those patterns and turns them into practical, low-noise warnings.


At a Glance

  • Signals: Hidden co-change dependencies discovered from Git history.
  • Safety: Pre-commit warnings that never block your commit.
  • Performance: Sparse matrices keep memory use low on large repos.
  • Transparency: Simple, explainable $P(B|A)$ confidence score.

How It Works (Pipeline)

  1. Ingestion: git log --all --name-only --pretty=format:"COMMIT_START:%H"
  2. Filtering:
    • Drop commits touching more than 15 files.
    • Remove noise files (lockfiles, images).
  3. Matrix Creation: Build a sparse commit-file matrix.
  4. Math Engine:
    • mlxtend.fpgrowth finds frequent co-change pairs.
    • Markov co-change confidence: $P(B|A) = \frac{Count(A \cap B)}{Count(A)}$.
  5. Caching: Save the model to .prehistorian/model.joblib.

Features

  • CLI-only workflow, no UI or server required.
  • Sparse data pipeline for large repositories.
  • Fast co-change discovery via fpgrowth.
  • Actionable warnings without blocking commits.

Installation

pip install .

Development dependencies:

pip install .[dev]

Quick Start

prehistorian scan
prehistorian query path/to/file.py
prehistorian hook-install

If the prehistorian command is not on PATH:

python -m prehistorian scan

Guided Usage

1) Build the model

Command:

prehistorian scan

Expected output:

Prehistorian analyzed 842 commits. Found 126 behavioral dependencies. Model saved.

2) Query co-change dependencies

Command:

prehistorian query path/to/file.py

Expected output:

Co-change Dependencies for 'path/to/file.py'
File Path                          Co-change Confidence (%)
src/core/scheduler.py             88.24%
src/core/config.py                75.61%

If there is no model yet:

Model not found. Run `prehistorian scan` first.

3) Install the pre-commit hook

Command:

prehistorian hook-install

Expected output:

Successfully installed pre-commit hook at .git/hooks/pre-commit

4) Pre-commit check (runs automatically)

Manual command (optional):

prehistorian pre-commit-check

Expected warning (commit never blocked):

[PREHISTORIAN WARNING] You are committing 'A', but historically you also change 'B' 85% of the time. Did you forget to stage it?

Commands

Command Description
prehistorian scan Build and cache the co-change model.
prehistorian query <file_path> Show top co-changing files and confidence.
prehistorian hook-install Install a git pre-commit hook.
prehistorian pre-commit-check Warn about missing co-changed files.

Pre-commit Warnings

During git commit, Prehistorian checks the top 2 co-changed files for every staged file. If a highly correlated file is missing (>= 75%), it prints a warning but never blocks the commit.


Testing and CI

python -m pytest

GitHub Actions runs the tests on pushes and pull requests.


Release (PyPI + GitHub)

The release workflow publishes to PyPI and creates a GitHub Release when a version tag is pushed.

For this release, use tag 1.1.2:

git tag 1.1.2
git push origin 1.1.2

Notes and Limitations

  • Must be run inside a Git repository.
  • Commits that touch more than 15 files are skipped.
  • Noise files (lockfiles and images) are filtered out.
  • Delete .prehistorian/ at any time to rebuild the model.

License

MIT License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prehistorian-1.1.2.tar.gz (12.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prehistorian-1.1.2-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file prehistorian-1.1.2.tar.gz.

File metadata

  • Download URL: prehistorian-1.1.2.tar.gz
  • Upload date:
  • Size: 12.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for prehistorian-1.1.2.tar.gz
Algorithm Hash digest
SHA256 867a10486c578e7048e3a1cbbf467a7a987f56a1c1f7ad0e15e0dfac5c20e973
MD5 25564837de7d80a5e40052ee9c20af0b
BLAKE2b-256 9a775af3981f61c8aed2683d777fc9a665f30ddb8c9b92dbb303bc08b18dabd5

See more details on using hashes here.

File details

Details for the file prehistorian-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: prehistorian-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 10.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for prehistorian-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9b784b88b0a71b743c813a55c529019d3c4ae46e38b4cd67d04796124110805f
MD5 5f34c8d33fa5bb20fd33bce8964fd320
BLAKE2b-256 a53774dc06703325d713609b3cf247c2cae4cc7dcc75c3df197e8d3b5725fe21

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page