Skip to main content

github2pandas supports the aggregation of project activities in a GitHub repository and makes them available in pandas dataframes

Project description

Transform GitHub Activities to Pandas Dataframes

General information

This package is being developed by the participating partners (TU Bergakademie Freiberg, OVGU Magdeburg and HU Berlin) as part of the DiP-iT project Website.

The package implements Python functions for

  • aggregating and preprocessing GitHub activities (Commits, Actions, Issues, Pull-Requests) and
  • generating project progress summaries according to different metrics (ratio of changed lines, ratio of aggregated Levenshtein distances e.g.).

github2pandas stores the collected information in a collection of pandas DataFrames starting from a user defined root folder. The structure beyond that (file names, folder names) is defined as a member variable in the corresponding classes and can be overwritten. The default configuration results in the following file structure.

|-- My_Github_Repository_0               <- Repository name
|   |- Repo.json                         <- Json file containing user and repo name
|   |- Repository
|   |   |- Repository.p  
|   |- Issues
|   |   |- pdIssuesComments.p
|   |   |- pdIssuesEvents.p
|   |   |- pdIssues.p
|   |   |- pdIssuesReactions.p
|   |- PullRequests
|   |   |- pdPullRequestsComments.p
|   |   |- pdPullRequestsCommits.p
|   |   |- pdPullRequestsEvents.p
|   |   |- pdPullRequests.p
|   |   |- pdPullRequestsReactions.p
|   |   |- pdPullRequestsReviews.p
|   |- Users.p
|   |- Versions
|   |   |- pdCommits.p
|   |   |- pdEdits.p
|   |   |- pdBranches.p
|   |   |- pVersions.db
|   |   |- repo                         <- Repository clone
|   |   |   |- ..
|   |- Workflows
|       |- pdWorkflows.p
|-- My_Github_Repository_1
...

The internal structure and relations of the data frames are included in the project's wiki.

Installation

github2pandas is available on pypi. Use pip to install the package.

global

On Linux:

sudo pip3 install github2pandas 
sudo pip install github2pandas

On Windows as admin or for one user:

pip install github2pandas
pip install --user github2pandas 

in virtual environment:

pipenv install github2pandas

Usage

GitHub token is required for use, which is used for authentication. The website describes how you can generate this for your GitHub account. Customise the username and project name and explore any public or private repository you have access to with your account!

Access token is to define in .env oder .py (.ipynb) file. The default value of python.envFile setting is ${workspaceFolder}/.env

TOKEN="example_token"

An short example of a python script:

import os

from github2pandas.issues import Issues
from github2pandas.utility import Utility
from pathlib import Path

git_repo_name = "github2pandas"
git_repo_owner = "TUBAF-IFI-DiPiT"
    
default_data_folder = Path("data", git_repo_name)
github_token = os.environ['TOKEN']

repo = Utility.get_repo(git_repo_owner, git_repo_name, github_token, default_data_folder)
Issues.generate_issue_pandas_tables(repo, default_data_folder)
issues = Issues.get_issues(default_data_folder,Issues.ISSUES)

# List the last 14 issue entries
issues.head(14)

Notebook examples

The corresponding github2pandas_notebooks repository illustrates the usage with examplary investigations.

The documentation of the module is available at https://github2pandas.readthedocs.io/.

Working with pipenv

Process Command
Installation pipenv install --dev
Run specific script pipenv run python file.py
Run all Tests pipenv run python -m unittest
Run all tests in a specific folder pipenv run python -m unittest discover -s 'tests'
Run all tests with specific filename pipenv run python -m unittest discover -p 'test_*.py'
Start Jupyter server in virtual environment pipenv run jupyter notebook

For Contributors

Naming conventions: https://namingconvention.org/python/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

github2pandas-1.1.18.tar.gz (19.5 kB view details)

Uploaded Source

Built Distribution

github2pandas-1.1.18-py3-none-any.whl (22.4 kB view details)

Uploaded Python 3

File details

Details for the file github2pandas-1.1.18.tar.gz.

File metadata

  • Download URL: github2pandas-1.1.18.tar.gz
  • Upload date:
  • Size: 19.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.8

File hashes

Hashes for github2pandas-1.1.18.tar.gz
Algorithm Hash digest
SHA256 59f977b3584655c00fc12a2fd97c0c0971598c5ec4297c521316afcd63f28677
MD5 e1b2d890c00133949c2e70dbc518d867
BLAKE2b-256 7824ac36f82db1c788228243621004db5ff6ff24602b7881059fde7e187f97e3

See more details on using hashes here.

File details

Details for the file github2pandas-1.1.18-py3-none-any.whl.

File metadata

  • Download URL: github2pandas-1.1.18-py3-none-any.whl
  • Upload date:
  • Size: 22.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.8

File hashes

Hashes for github2pandas-1.1.18-py3-none-any.whl
Algorithm Hash digest
SHA256 32b2379dad778f173b4e7e531a3e0b264db7a6340fc4dc2349e5b1768e1456a5
MD5 ba207da15e8b1c86d3453e8f1d09bf69
BLAKE2b-256 788d7e7f90dcc444719125401b2b0d04b4bd58dc49c41347fdbf5f8f6fb7a14f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page