Skip to main content

A utility for interacting with data from git repositories as Pandas dataframes

Project description

Git-Pandas
==========

v0.0.1

A simple set of wrappers around gitpython for creating pandas dataframes out of git data. The project is centered around
two primary objects:

* Repository()
* ProjectDirectory()

A Repository object contains a single git repo, and is used to interact with it. A ProjectDirectory references a directory
in your filesystem which may have in it multiple git repositories. The subdirectories are all walked to find any child
repos, and any analysis is aggregated up from all of those into a single output (pandas dataframe).

Current functionality includes:

* Commit history with extension and directory filtering
* Blame with extension and directory filtering

Please see examples for more detailed usage.

Installation
------------

To install use:

pip install git-pandas


Examples / Usage
----------------

A repository is just 1 git repo:

from git import Repo
import os
from pandas import DataFrame, to_datetime, set_option
import datetime
import sys
import numpy as np

set_option('display.height', 1000)
set_option('display.max_rows', 500)
set_option('display.max_columns', 500)
set_option('display.width', 1000)

# build an example repository object and try some things out
dir = ''
ignore_dirs = [
'docs',
'tests',
'Data'
]
r = Repository(dir)

# is it bare?
print('\nRepo bare?')
print(r.is_bare())
print('\n')

# get the commit history
ch = r.commit_history('develop', limit=None, extensions=['py'], ignore_dir=ignore_dirs)
print(ch.head(5))

# get the list of committers
print('\nCommiters:')
print(''.join([str(x) + '\n' for x in set(ch['committer'].values)]))
print('\n')

# print out everyone's contributions
attr = ch.reindex(columns=['committer', 'lines', 'insertions', 'deletions']).groupby(['committer'])
attr = attr.agg({
'lines': np.sum,
'insertions': np.sum,
'deletions': np.sum
})
print(attr)

# get the file change history
fh = r.file_change_history('develop', limit=None, ignore_dir=ignore_dirs)
fh['ext'] = fh['filename'].map(lambda x: x.split('.')[-1])
print(fh.head(50))

# print out unique extensions
print('\nExtensions Found:')
print(''.join([str(x) + '\n' for x in set(fh['ext'].values)]))
print('\n')

# agg by extension
etns = fh.reindex(columns=['ext', 'insertions', 'deletions']).groupby(['ext'])
etns = etns.agg({
'insertions': np.sum,
'deletions': np.sum
})
print(etns)

A project is a collection of repos:

import os
import sys
from git import Repo, GitCommandError
import numpy as np
from pandas import DataFrame, set_option
from gitpandas.repository import Repository

set_option('display.height', 1000)
set_option('display.max_rows', 500)
set_option('display.max_columns', 500)
set_option('display.width', 1000)

p = ProjectDirectory(working_dir='/foo/bar/')

# get the commit history
ch = p.commit_history('develop', limit=None)
print(ch.head(5))

# get the list of committers
print('\nCommiters:')
print(''.join([str(x) + '\n' for x in set(ch['committer'].values)]))
print('\n')

# print out everyone's contributions
attr = ch.reindex(columns=['committer', 'lines', 'insertions', 'deletions']).groupby(['committer'])
attr = attr.agg({
'lines': np.sum,
'insertions': np.sum,
'deletions': np.sum
})
print(attr)

Contributing
------------

If you'd like to contribute, let me know, or just submit a pull request. We have no specific long term goals or guidelines
at this stage.

License
-------

This is BSD licensed (see License.md)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

git-pandas-0.0.1.tar.gz (5.3 kB view details)

Uploaded Source

File details

Details for the file git-pandas-0.0.1.tar.gz.

File metadata

  • Download URL: git-pandas-0.0.1.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for git-pandas-0.0.1.tar.gz
Algorithm Hash digest
SHA256 8f3da8d88deac5841fd82386b43412a62640106582fbc5a00fd07589ffc413f8
MD5 e75ef2ac668f68f2e79e2ab6f1911456
BLAKE2b-256 1fb0ab5aa72b728a0b2f584c892fd7413d49e61d2c9408a0dbd17e8ea17ddece

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page