A predictive codebase cartographer using Git history
Project description
Prehistorian
Prehistorian is a predictive codebase cartographer. It analyzes Git history to uncover hidden, undocumented behavioral dependencies between files and warns you when a change is likely missing a related update.
It is CLI-only, LLM-free, and CPU-only by design.
Why Prehistorian
Static imports and direct references show only explicit dependencies. Real-world teams rely on patterns that rarely show up in the import graph: files that consistently change together across many commits. Prehistorian surfaces those patterns and turns them into practical, low-noise warnings.
At a Glance
- Signals: Hidden co-change dependencies discovered from Git history.
- Safety: Pre-commit warnings that never block your commit.
- Performance: Sparse matrices keep memory use low on large repos.
- Transparency: Simple, explainable $P(B|A)$ confidence score.
How It Works (Pipeline)
- Ingestion:
git log --all --name-only --pretty=format:"COMMIT_START:%H" - Filtering:
- Drop commits touching more than 15 files.
- Remove noise files (lockfiles, images).
- Matrix Creation: Build a sparse commit-file matrix.
- Math Engine:
mlxtend.fpgrowthfinds frequent co-change pairs.- Markov co-change confidence: $P(B|A) = \frac{Count(A \cap B)}{Count(A)}$.
- Caching: Save the model to
.prehistorian/model.joblib.
Features
- CLI-only workflow, no UI or server required.
- Sparse data pipeline for large repositories.
- Fast co-change discovery via
fpgrowth. - Actionable warnings without blocking commits.
Installation
pip install .
Development dependencies:
pip install .[dev]
Quick Start
prehistorian scan
prehistorian query path/to/file.py
prehistorian hook-install
If the prehistorian command is not on PATH:
python -m prehistorian scan
Guided Usage
1) Build the model
Command:
prehistorian scan
Expected output:
Prehistorian analyzed 842 commits. Found 126 behavioral dependencies. Model saved.
2) Query co-change dependencies
Command:
prehistorian query path/to/file.py
Expected output:
Co-change Dependencies for 'path/to/file.py'
File Path Co-change Confidence (%)
src/core/scheduler.py 88.24%
src/core/config.py 75.61%
If there is no model yet:
Model not found. Run `prehistorian scan` first.
3) Install the pre-commit hook
Command:
prehistorian hook-install
Expected output:
Successfully installed pre-commit hook at .git/hooks/pre-commit
4) Pre-commit check (runs automatically)
Manual command (optional):
prehistorian pre-commit-check
Expected warning (commit never blocked):
[PREHISTORIAN WARNING] You are committing 'A', but historically you also change 'B' 85% of the time. Did you forget to stage it?
Commands
| Command | Description |
|---|---|
prehistorian scan |
Build and cache the co-change model. |
prehistorian query <file_path> |
Show top co-changing files and confidence. |
prehistorian hook-install |
Install a git pre-commit hook. |
prehistorian pre-commit-check |
Warn about missing co-changed files. |
Pre-commit Warnings
During git commit, Prehistorian checks the top 2 co-changed files for every staged file. If a highly correlated file is missing (>= 75%), it prints a warning but never blocks the commit.
Testing and CI
python -m pytest
GitHub Actions runs the tests on pushes and pull requests.
Release (PyPI + GitHub)
The release workflow publishes to PyPI and creates a GitHub Release when a version tag is pushed.
For this release, use tag 1.1.2:
git tag 1.1.2
git push origin 1.1.2
Notes and Limitations
- Must be run inside a Git repository.
- Commits that touch more than 15 files are skipped.
- Noise files (lockfiles and images) are filtered out.
- Delete
.prehistorian/at any time to rebuild the model.
License
MIT License. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file prehistorian-1.1.2.tar.gz.
File metadata
- Download URL: prehistorian-1.1.2.tar.gz
- Upload date:
- Size: 12.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
867a10486c578e7048e3a1cbbf467a7a987f56a1c1f7ad0e15e0dfac5c20e973
|
|
| MD5 |
25564837de7d80a5e40052ee9c20af0b
|
|
| BLAKE2b-256 |
9a775af3981f61c8aed2683d777fc9a665f30ddb8c9b92dbb303bc08b18dabd5
|
File details
Details for the file prehistorian-1.1.2-py3-none-any.whl.
File metadata
- Download URL: prehistorian-1.1.2-py3-none-any.whl
- Upload date:
- Size: 10.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9b784b88b0a71b743c813a55c529019d3c4ae46e38b4cd67d04796124110805f
|
|
| MD5 |
5f34c8d33fa5bb20fd33bce8964fd320
|
|
| BLAKE2b-256 |
a53774dc06703325d713609b3cf247c2cae4cc7dcc75c3df197e8d3b5725fe21
|