Skip to main content

github2pandas supports the aggregation of project activities in a GitHub repository and makes them available in pandas dataframes

Project description

Transform GitHub Activities to Pandas Dataframes

General information

This package is being developed by the participating partners (TU Bergakademie Freiberg, OVGU Magdeburg and HU Berlin) as part of the DiP-iT project Website.

The package implements Python functions for

  • aggregating and preprocessing GitHub activities (Commits, Actions, Issues, Pull-Requests) and
  • generating project progress summaries according to different metrics (ratio of changed lines, ratio of aggregated Levenshtein distances e.g.).

github2pandas stores the collected information in a collection of pandas DataFrames starting from a user defined root folder. The structure beyond that (file names, folder names) is defined as a member variable in the corresponding classes and can be overwritten. The default configuration results in the following file structure.

data                                     <- Root directory given as parameter
├── My_Github_Repository_0               <- Repository name
│   ├── Repo.json                        <- Json file containing user and repo name
│   ├── Issues
│   │   ├── pdIssuesComments.p
│   │   ├── pdIssuesEvents.p
│   │   ├── pdIssues.p
│   │   └── pdIssuesReactions.p
│   ├── PullRequests
│   │   ├── pdPullRequestsComments.p
│   │   ├── pdPullRequestsEvents.p
│   │   ├── pdPullRequests.p
│   │   ├── pdPullRequestsReactions.p
│   │   └── pdPullRequestsReviews.p
│   ├── Users.p
│   ├── Versions
│   │   ├── pdCommits.p
│   │   ├── pdEdits.p
│   │   ├── pdBranches.p
│   │   ├── repo                         <- Repository clone
│   │   │   ├── ..
│   |   |   └── ..
│   │   └── Versions.db
│   └── Workflows
│       └── pdWorkflows.p
├── My_Github_Repository_1
...

The internal structure and relations of the data frames are included in the project's wiki.

Installation

Due to the early stage of development the github2pandas package is not yet available as a pip package. Installations should be done accordingly as follows:

  1. Generate local clone of the package
    git clone https://github.com/TUBAF-IFI-DiPiT/github2pandas.git
    
  2. Include the specific folder to your python path
    pipenv install --dev
    

Application examples

GitHub token is required for use, which is used for authentication. The website describes how you can generate this for your GitHub account. Customise the username and project name and explore any public or private repository you have access to with your account!

Aspect Example Executable notebook
Overview Example Overview_Example.ipynb Binder
Commits & Edits Version_Example.ipynb
Workflows / Actions Workflow_Example.ipynb
Issues Issue_Example.ipynb
Pull-Requests Pull_Requests_Example.ipynb

The documentation of the module is available at XXX.

For Developers

Naming conventions: https://namingconvention.org/python/

Working with pipenv

Process Command
Installation pipenv install --dev
Run specific script pipenv run python file.py
Run all Tests pipenv run python -m unittest
Run all tests in a specific folder pipenv run python -m unittest discover -s 'tests'
Run all tests with specific filename pipenv run python -m unittest discover -p 'test_*.py'
Start Jupyter server in virtual environment pipenv run jupyter notebook

Generating documentation

  1. Run following command in main folder
pipenv run  sphinx-apidoc -o ./docu/source/ ./github2pandas
  1. Generate html documentation
cd docu
make html

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

github2pandas-1.0.1.tar.gz (17.5 kB view details)

Uploaded Source

Built Distribution

github2pandas-1.0.1-py3-none-any.whl (18.7 kB view details)

Uploaded Python 3

File details

Details for the file github2pandas-1.0.1.tar.gz.

File metadata

  • Download URL: github2pandas-1.0.1.tar.gz
  • Upload date:
  • Size: 17.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.8

File hashes

Hashes for github2pandas-1.0.1.tar.gz
Algorithm Hash digest
SHA256 246368c8a5d808aac2581deace472306fa53bad5aaa2f88a53d924a3203d99c9
MD5 25d978f00ddc64d1b31ef47195770856
BLAKE2b-256 b2b11445e27ad4a9c6447e2a0be25a4fd897c3647d7ce92f7357743d56f6684b

See more details on using hashes here.

File details

Details for the file github2pandas-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: github2pandas-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 18.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.8

File hashes

Hashes for github2pandas-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 35982e9e7a0a3fd4d8ba87c49b29a74ec2aaba5593b675d9cafc8053ba292f65
MD5 01da7fa60c23e82e3b7fc4e9ea7698a4
BLAKE2b-256 5b08f1290f42e9c933d0a0f3fd107489f0845c884023702204e5471eed29a43f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page