No project description provided
Project description
Overview
MI project collects data from GitHub repositories. You can use it to either collect data stored locally or within Amazon’s S3 cloud. For personal usage, checkout <Usage> section.
Together with mi-scheduler, we provide automated data extraction pipeline for data minig of requested repositories and organizations. This pipeline can be scheduled customly, e.g. to run daily, weekly, and so on.
Data extraction request
To request data extraction for repository or organization, create Data Extraction Issue in MI-Scheduler repository. Use this link TODO
Data extraction Pipeline (diagram)
MI pipeline is simple to understand, see diagram below
+---------+
|ConfigMap|
+----+----+
|
+--+-------+--------+--+
| | | |
| | mi-scheduler | |
| | | |
+------+---+---+-------+
| | | | |
| | | | |
| | | | |
| Argo Workflows |
| | | | |
| | | | |
+---------------v---v---v---v----v------------------+ +-------------------- +--------------------+
| | | Visualization | | Recommendation |
| +---------+ +---------+ +---------+ | +-------------------+ +--------------------+
| |thoth/ | | AICoE | | your | | | Project Health | | thoth |
| | station| | | | org | | | (dashboard) | | |
| +---------+ +---------+ +---------+ | | | | |
| |solver | |... | |your | | +---------+---------+ +----------+---------+
| | | | | | repos | | thoth-station/mi ^ ^
| |amun | |... | X X X X X | | | (Meta-information Indicators) | |
| | | | | | | | +-------------+---------------+
| |adviser | |... | | | | |
| | | | | | | | |
| |.... | |... | | | | +-----------------+-------------------+
| | | | | | | | | |
| +---------+ +---------+ +---------+ | | Knowledge Processsing |
| | | |
+-----------------------+---------------------------+ +-----------------+-------------------+
GitHub repositories | ^
| +--------------------------------------------------------+ |
| | | |
| | Entities Analysis +-------> Knowledge | |
+---------------->-+ +--------------------+
+---------+----------------+----------+------------------+
| Issues | Pull Requests | Readmes | etc........... |
| | | | |
+---------+----------------+----------+------------------+
What can MI extract from GitHub?
MI analyses entities specified on the srcopsmetrics/entities page Entity is essentialy a repository metadata that is being inspected (e.g. Issue or Pull Request), from which specified features are extracted and are stored to dataframe.
MI is essentialy wrapped around PyGitHub module to provide careless data extraction with API rate limit handling and data updating.
Install
pip
MI is available through PyPI, so you can do
pip install srcopsmetrics
git
Alternatively, you can install srcopsmetrics by cloning repository
git clone https://github.com/thoth-station/mi.git
cd mi
pipenv install --dev
Usage
Setup
Connect to GitHub
To be able to extract data from GitHub, access token must be configured. To generate one, read this
To use the token with mi, set GITHUB_ACESS_TOKEN environment variable to the token value, for example:
export GITHUB_ACESS_TOKEN=<token_string>
or
GITHUB_ACESS_TOKEN=<token_string> python -m srcopsmetrics.cli ...
and etc.
Data Location
To store data locally, use -l when calling CLI or set is_local=True when using MI as a module.
By default MI will try to store the data on Ceph. In order to store on Ceph you need to provide the following env variables:
S3_ENDPOINT_URL Ceph Host name
CEPH_BUCKET Ceph Bucket name
CEPH_BUCKET_PREFIX Ceph Prefix
CEPH_KEY_ID Ceph Key ID
CEPH_SECRET_KEY Ceph Secret Key
For more information about Ceph storing look here
CLI
To view all of the available commands and their description use
python -m srcopsmetrics.cli --help
See some of the general usage examples below
Get repository PullRequest data locally
python -m srcopsmetrics.cli --create --is-local --repository foo_repo --entities PullRequest
which is equivalent to
python -m srcopsmetrics.cli -clr foo_repo -e PullRequest
Get organization PR data locally
python -m srcopsmetrics.cli -clo foo_org -e PullRequest
Get multiple repository PR data locally
python -m srcopsmetrics.cli -clr foo_repo,bar_repo -e PullRequest
Get multiple entity data locally
python -m srcopsmetrics.cli -clr foo_repo -e PullRequest,Issue,Commit
Meta-Information Entities Data
How to load data
Indicators
To know more about indicators that are extracted from data, check out Meta-Information Indicators.
How to contribute
Always feel free to open new Issues or engage in already existing ones!
Custom Entities & Metrics
If you want to contribute by adding new entity or metric that will be analysed from GitHub repositories, feel free to open up an Issue and describe why do you think this new entity should be analysed and what are the benefits of doing so according to the goal of thoth-station/mi project.
After creating Issue, you can wait for the response of thoth-station devs Do not forget to reference the Issue in your Pull Request.
Implementation
Look at Template entity to get an idea for requirements that need to be satisfied for custom entity implementation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file srcopsmetrics-2.11.1.tar.gz
.
File metadata
- Download URL: srcopsmetrics-2.11.1.tar.gz
- Upload date:
- Size: 53.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ee9af0b4e807c68e184d90499e184bf3bdbdf2b365521563fcfcab26dda1ebdf |
|
MD5 | 9e02d51db4ff46c744ceb46dcfa378e9 |
|
BLAKE2b-256 | 42d6160f87926a557b5d758d26d964a0538341cbe06faed536a46f5fadc95d22 |
File details
Details for the file srcopsmetrics-2.11.1-py3-none-any.whl
.
File metadata
- Download URL: srcopsmetrics-2.11.1-py3-none-any.whl
- Upload date:
- Size: 84.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b08235ab407797911ce092ba5c4aed1daf2672a15a322dc71939bb245043a68b |
|
MD5 | dd3f1a836cd7ceb488d4eb480b6027eb |
|
BLAKE2b-256 | faadd082a7923908ceb0aa63a362c1a826d387549a38f7ff355d8b8b91a58d9c |