Skip to main content

No project description provided

Project description

Overview

MI project collects data from GitHub repositories. You can use it to either collect data stored locally or within Amazon’s S3 cloud. For personal usage, checkout <Usage> section.

Together with mi-scheduler, we provide automated data extraction pipeline for data minig of requested repositories and organizations. This pipeline can be scheduled customly, e.g. to run daily, weekly, and so on.

Data extraction request

To request data extraction for repository or organization, create Data Extraction Issue in MI-Scheduler repository. Use this link TODO

Data extraction Pipeline (diagram)

MI pipeline is simple to understand, see diagram below

                  +---------+
                  |ConfigMap|
                  +----+----+
                       |
            +--+-------+--------+--+
            |  |                |  |
            |  |  mi-scheduler  |  |
            |  |                |  |
            +------+---+---+-------+
                |   |   |   |    |
                |   |   |   |    |
                |   |   |   |    |
                | Argo Workflows |
                |   |   |   |    |
                |   |   |   |    |
+---------------v---v---v---v----v------------------+                                          +--------------------        +--------------------+
|                                                   |                                          |   Visualization   |        |   Recommendation   |
|  +---------+  +---------+            +---------+  |                                          +-------------------+        +--------------------+
|  |thoth/   |  |  AICoE  |            | your    |  |                                          |   Project Health  |        |   thoth            |
|  |  station|  |         |            |     org |  |                                          |    (dashboard)    |        |                    |
|  +---------+  +---------+            +---------+  |                                          |                   |        |                    |
|  |solver   |  |...      |            |your     |  |                                          +---------+---------+        +----------+---------+
|  |         |  |         |            |   repos |  |           thoth-station/mi                         ^                             ^
|  |amun     |  |...      | X X X X X  |         |  |     (Meta-information Indicators)                  |                             |
|  |         |  |         |            |         |  |                                                    +-------------+---------------+
|  |adviser  |  |...      |            |         |  |                                                                  |
|  |         |  |         |            |         |  |                                                                  |
|  |....     |  |...      |            |         |  |                                                +-----------------+-------------------+
|  |         |  |         |            |         |  |                                                |                                     |
|  +---------+  +---------+            +---------+  |                                                |       Knowledge Processsing         |
|                                                   |                                                |                                     |
+-----------------------+---------------------------+                                                +-----------------+-------------------+
GitHub repositories   |                                                                                              ^
                        |                 +--------------------------------------------------------+                   |
                        |                 |                                                        |                   |
                        |                 |      Entities Analysis   +------->      Knowledge      |                   |
                        +---------------->-+                                                      +--------------------+
                                          +---------+----------------+----------+------------------+
                                          |  Issues |  Pull Requests |  Readmes |  etc...........  |
                                          |         |                |          |                  |
                                          +---------+----------------+----------+------------------+

What can MI extract from GitHub?

MI analyses entities specified on the srcopsmetrics/entities page Entity is essentialy a repository metadata that is being inspected (e.g. Issue or Pull Request), from which specified features are extracted and are stored to dataframe.

MI is essentialy wrapped around PyGitHub module to provide careless data extraction with API rate limit handling and data updating.

Install

pip

MI is available through PyPI, so you can do

pip install srcopsmetrics

git

Alternatively, you can install srcopsmetrics by cloning repository

git clone https://github.com/thoth-station/mi.git

cd mi

pipenv install --dev

Usage

Setup

To store data locally, use -l when calling CLI or set is_local=True when using MI as a module.

By default MI will try to store the data on Ceph. In order to store on Ceph you need to provide the following env variables:

  • S3_ENDPOINT_URL Ceph Host name

  • CEPH_BUCKET Ceph Bucket name

  • CEPH_BUCKET_PREFIX Ceph Prefix

  • CEPH_KEY_ID Ceph Key ID

  • CEPH_SECRET_KEY Ceph Secret Key

For more information about Ceph storing look here

CLI

See –help for all available options

See some of the examples below

Get repository PullRequest data locally

srcopsmetrics --create --is-local --repository foo_repo --entities PullRequest

which is equivalent to

srcopsmetrics -clr foo_repo -e PullRequest

Get organization PR data locally

srcopsmetrics -clo foo_org -e PullRequest

Get multiple repository PR data locally

srcopsmetrics -clr foo_repo,bar_repo -e PullRequest

Get multiple entity data locally

srcopsmetrics -clr foo_repo -e PullRequest,Issue,Commit

Meta-Information Entities Data

To know more about indicators that are extracted from data, check out Meta-Information Indicators.

Data loading using modules

>>> from srcopsmetrics.entities.pull_request import PullRequest

>>> full_repo_slug = "thoth-station/mi"
>>> pr = PullRequest(full_repo_slug)

>>> # for local data in default mi data path
>>> data = load_previous_knowledge(is_local=True)
>>> data.head()
                                       title                                               body size  ...   changed_files     first_review_at    first_approve_at
id                                                                                                        ...
97                   ⬆️ Bump rsa from 4.0 to 4.7  Bumps [rsa](https://github.com/sybrenstuvel/py...    L  ...  [Pipfile.lock]                 NaT                 NaT
96              ⬆️ Bump pyyaml from 5.3.1 to 5.4  Bumps [pyyaml](https://github.com/yaml/pyyaml)...    L  ...  [Pipfile.lock]                 NaT                 NaT
95                   ⬆️ Bump rsa from 4.0 to 4.1  Bumps [rsa](https://github.com/sybrenstuvel/py...    L  ...  [Pipfile.lock]                 NaT                 NaT
94  Automatic update of dependencies by Kebechet  Kebechet has updated the depedencies to the la...    L  ...  [Pipfile.lock] 2021-03-22 08:00:14 2021-03-22 08:00:14
93  Automatic update of dependencies by Kebechet  Kebechet has updated the depedencies to the la...    L  ...  [Pipfile.lock] 202

Any other entity is loaded in the similar way. If you intend to load remote data from Ceph, all of the Ceph variables need to be specified (see more in Setup section).

How to contribute

Always feel free to open new Issues or engage in already existing ones!

Custom Entities & Metrics

If you want to contribute by adding new entity or metric that will be analysed from GitHub repositories, feel free to open up an Issue and describe why do you think this new entity should be analysed and what are the benefits of doing so according to the goal of thoth-station/mi project.

After creating Issue, you can wait for the response of thoth-station devs Do not forget to reference the Issue in your Pull Request.

Implementation

Look at Template entity to get an idea for requirements that need to be satisfied for custom entity implementation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

srcopsmetrics-2.9.0.tar.gz (36.0 kB view details)

Uploaded Source

Built Distribution

srcopsmetrics-2.9.0-py3-none-any.whl (65.0 kB view details)

Uploaded Python 3

File details

Details for the file srcopsmetrics-2.9.0.tar.gz.

File metadata

  • Download URL: srcopsmetrics-2.9.0.tar.gz
  • Upload date:
  • Size: 36.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/39.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.8

File hashes

Hashes for srcopsmetrics-2.9.0.tar.gz
Algorithm Hash digest
SHA256 83d763561bcd728ed54519d0a768a921944e8bba0d4f333f900ee608143e5447
MD5 3f47e431d6c0b48099e8c112e0d72a76
BLAKE2b-256 eaf980c53c1cb1571a6fea0965db9588ea98ac3099da070d539d12e30bc59c34

See more details on using hashes here.

File details

Details for the file srcopsmetrics-2.9.0-py3-none-any.whl.

File metadata

  • Download URL: srcopsmetrics-2.9.0-py3-none-any.whl
  • Upload date:
  • Size: 65.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/39.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.8

File hashes

Hashes for srcopsmetrics-2.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cd3fdff7b7c11942629772922794da8db0ce8b94b4b7ba5f13cf5af786f7fa5a
MD5 bd17c255252a81ec43fd2e3e483ec775
BLAKE2b-256 59ac225a701482815b74b204d6dc2133e63cda02a5add39eed0d700cca068361

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page