A utility for interacting with data from git repositories as Pandas dataframes
Project description
Git-Pandas
Git-Pandas is a powerful Python library that transforms Git repository data into pandas DataFrames, making it easy to analyze and visualize your codebase's history, contributors, and development patterns. Built on top of GitPython, it provides a simple yet powerful interface for extracting meaningful insights from your Git repositories.
Why Git-Pandas?
- Easy to Use: Simple API that converts Git data into familiar pandas DataFrames
- Comprehensive Analysis: From basic commit history to complex metrics like bus factor
- Flexible: Works with single repositories or entire project directories
- Visualization Ready: Built-in plotting utilities for common Git analytics
- Performance Optimized: Optional caching support for memory-intensive operations
Core Components
Repository
The Repository
class provides a wrapper around a single Git repository, offering methods to:
- Extract commit history with filtering by extension and directory
- Analyze file changes and blame information
- Track branch and tag information
- Generate cumulative blame statistics
- Calculate file ownership and contribution patterns
ProjectDirectory
The ProjectDirectory
class enables analysis across multiple repositories:
- Automatically discovers and analyzes nested Git repositories
- Aggregates metrics across multiple repositories
- Provides project-level insights and statistics
- Calculates cross-repository metrics like total development time
Key Features
Repository Analysis
- Commit History: Track changes with extension and directory filtering
- File Analysis: Monitor edited files and blame information
- Branch & Tag Management: Access repository structure information
- Cumulative Blame: Generate time-series data of code ownership
- File Ownership: Approximate file ownership and contribution patterns
Project Insights
- Bus Factor: Calculate project sustainability metrics
- Development Time: Estimate hours spent per project or author
- Contributor Analysis: Track individual and team contributions
- Project Health: Generate comprehensive project information tables
GitHub Integration
- Profile Analysis: Analyze GitHub.com profiles via
GitHubProfile
object - Repository Metrics: Extract repository-specific insights
- Contributor Insights: Track external contributions and collaborations
Visualization Tools
- Plotting Helpers: Built-in utilities for common Git analytics
- Punchcard Analysis: Generate and visualize commit patterns
- Blame Visualization: Create cumulative blame charts
- Time Series Analysis: Track changes and patterns over time
Installation
Git-Pandas supports Python 2.7+ and 3.3+. Install using pip:
pip install git-pandas
Quick Start
from gitpandas import Repository
# Analyze a single repository
repo = Repository('/path/to/repo')
commits_df = repo.commit_history()
blame_df = repo.blame()
# Analyze multiple repositories
from gitpandas import ProjectDirectory
project = ProjectDirectory('/path/to/project')
project_info = project.general_information()
Documentation
Comprehensive documentation is available at http://wdm0006.github.io/git-pandas/
Performance Optimization
For memory-intensive operations, Git-Pandas supports:
- Memory-based caching
- Redis-based caching
- Configurable cache durations
Projects Using Git-Pandas
- GitNOC: Network of Code analysis tool
- Commit Opener: Commit analysis and visualization tool
Contributing
We welcome contributions! Please review our Contributing Guidelines for details on:
- Code of Conduct
- Development Setup
- Pull Request Process
- Starter Issues
License
This project is BSD licensed (see LICENSE.md)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file git_pandas-2.3.0.tar.gz
.
File metadata
- Download URL: git_pandas-2.3.0.tar.gz
- Upload date:
- Size: 472.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ef79610a9214b513e70db9bc4527e10ba78f4e5c8d00be47c73cf0fda6b19568 |
|
MD5 | f8e97e3ae8238fcd98a56f3ebb32eb51 |
|
BLAKE2b-256 | 610682abc6d85095af209237087aa4cf1622e31967e585bb1d6b11e9be395259 |
Provenance
The following attestation bundles were made for git_pandas-2.3.0.tar.gz
:
Publisher:
pypi-publish.yml
on wdm0006/git-pandas
-
Statement:
- Statement type:
https://in-toto.io/Statement/v1
- Predicate type:
https://docs.pypi.org/attestations/publish/v1
- Subject name:
git_pandas-2.3.0.tar.gz
- Subject digest:
ef79610a9214b513e70db9bc4527e10ba78f4e5c8d00be47c73cf0fda6b19568
- Sigstore transparency entry: 200227602
- Sigstore integration time:
- Permalink:
wdm0006/git-pandas@f0c090c189b60397c523db6d6f915b2c99bb4244
- Branch / Tag:
refs/tags/v2.3.0-1
- Owner: https://github.com/wdm0006
- Access:
public
- Token Issuer:
https://token.actions.githubusercontent.com
- Runner Environment:
github-hosted
- Publication workflow:
pypi-publish.yml@f0c090c189b60397c523db6d6f915b2c99bb4244
- Trigger Event:
release
- Statement type: