Skip to main content

A package to extract GitHub repository insights

Project description

Introduction

A Python package to extract GitHub repository insights including commit history, pull request analysis, contributor trends, and overall repository health. Designed to simplify engineering reporting and performance tracking.


Requirements



Installation

pip install github-data-extractor


Usage and Documentation

This example shows how to use the geocentroid package.

from github_data_extractor import dataExtraction
from dotenv import load_dotenv
import os

load_dotenv()
GITHUB_TOKEN = os.getenv('GITHUB_TOKEN')

def main():
    repo_name = ['translate_lib']
    repo_owners = ['aadityayadav']
    repo_tokens = [GITHUB_TOKEN]

    extraction = dataExtraction(repo_name, repo_owners, repo_tokens)

    # method 1    
    extraction.extract_general_overview()
    # method 2
    extraction.extract_aggregate_metrics()
    # method 3
    extraction.extract_data_commit_contributor()
    # method 4
    extraction.extract_data_pr()

if __name__ == "__main__":
    main()

All functions take no parameters directly.
You must provide repo_name, repo_owners, and repo_tokens as lists, so you can extract data from multiple repositories at once.


1) extract_general_overview()

Fetches a high-level snapshot of the repository:

  • Branch information (total branches, last updated)
  • Linked vs unlinked issues
  • File data associated with each pull request

2) extract_aggregate_metrics()

Provides an overview of project health using aggregated statistics:

  • Commit activity over time
  • File modification frequency
  • Pull request volume and lifecycle
  • Pull request quality: reviews, size, and merge times

3) extract_data_commit_contributor()

Gathers contributor and commit behavior:

  • Commit counts by contributor
  • Time-based commit activity
  • New vs returning contributor patterns

4) extract_data_pr()

Detailed pull request analytics:

  • PR open/merge/close timestamps
  • Review histories and discussions
  • Issue linkages, milestone tagging, and contributor-level PR trends


Returns:

  • Automatically saves a CSV under a folder ExtractedData containing repo metrics.


Building the Package and Installing Locally Clone the repository then build the packages using:

pip3 install wheel
python3 setup.py bdist_wheel sdist
pip3 install .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

github_data_extractor-0.0.11.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

github_data_extractor-0.0.11-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file github_data_extractor-0.0.11.tar.gz.

File metadata

  • Download URL: github_data_extractor-0.0.11.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for github_data_extractor-0.0.11.tar.gz
Algorithm Hash digest
SHA256 03642b586091d8468cd69feebbf3f86b06d8b99622902162d1533c09b0db9611
MD5 f32e824ab15213f609c903886e63754f
BLAKE2b-256 c030951839f8dcbaa448127db003623936308f1c744de903c1e172a13a41ff2c

See more details on using hashes here.

File details

Details for the file github_data_extractor-0.0.11-py3-none-any.whl.

File metadata

File hashes

Hashes for github_data_extractor-0.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 7950994afa1c5da97e61305cf553347e7a7596769d4fb7c96e45553e37b95692
MD5 cbdbd94be9a288e8f3d12dcaddc620bd
BLAKE2b-256 0dcaa7b8d1022e0c650151e8c94b3c8c9a9654524734d643a0e692fb025193e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page