A module to crawl infrastructure code repositories and scripts from GitHub
Project description
iac-miner
A mining tool written in Python to mine software repositories for Infrastructure-as-Code
Available on PyPI: pip install iacminer.
APIs usage
First, export your Github's access token in an environment variable called GITHUB_ACCESS_TOKEN, with:
export GITHUB_ACCESS_TOKEN='yourtokenhere' (on Linux)
This variable will contain the token to access the GitHub APIs and avoid to be hard-coded in the code.
Mine Github
import os
from datetime import datetime
from iacminer.miners.github import GithubMiner
miner = GithubMiner(access_token = os.getenv('GITHUB_ACCESS_TOKEN'),
date_from = datetime.strptime('2020-01-01 00:00:00', '%Y-%m-%d %H:%M:%S'),
date_to = datetime.strptime('2020-01-02 00:00:00', '%Y-%m-%d %H:%M:%S'),
push_after=datetime.strptime('2020-06-07 00:00:00', '%Y-%m-%d %H:%M:%S'),
min_stars = <int>, # (default = 0)
min_releases = <int>, # (default = 0)
min_watchers = <int>, # (default = 0)
min_issues = <int>, # (default = 0)
primary_language = <str|None>, # e.g., 'python' (default = None)
include_fork = <True|False>) # (default = False)
for repository in miner.mine():
print(repository)
Mine repository
from iacminer.miners.repository import RepositoryMiner
miner = RepositoryMiner(token = os.getenv('GITHUB_ACCESS_TOKEN'),
path_to_repo='path/to/cloned/repository',
branch='development') # Optional (default 'master')
# Get only fixing commits by analyzing issues
fix_from_issues = miner.get_fixing_commits_from_closed_issues()
for sha in fix_from_issues:
print(sha)
# Get only fixing commits by analyzing commit messages
fix_from_commits = miner.get_fixing_commits_from_commit_messages()
for sha in fix_from_commits:
print(sha)
# Get all Ansible files touched by fixing commits
miner.set_fixing_commits() # Must call this method first
fixing_files = miner.get_fixing_files()
# Get files labeled as 'defect-prone' or 'defect-free'
labeled_files = miner.label(fixing_files)
Alternatively, execute the previous methods at once and extract metrics from labeled files on a per-release basis with:
from iacminer.miners.repository import RepositoryMiner
miner = RepositoryMiner(token = os.getenv('GITHUB_ACCESS_TOKEN'),
path_to_repo='path/to/cloned/repository',
branch='development', # Optional (default='master')
owner='radon-h2020', # Optional (default=None)
repo='radon-iac-miner') # Optional (default=None)
for metrics in miner.mine():
print(metrics)
Combine GithubMiner and RepositoryMiner
import os
from iacminer.miners.github import GithubMiner
from iacminer.miners.repository import RepositoryMiner
gh_miner = GithubMiner(access_token = os.getenv('GITHUB_ACCESS_TOKEN'),
min_stars=<int>, # Optional (default 0)
min_issues=<int>, # Optional (default 0)
)
for repository in gh_miner.mine():
print(repository)
repo_miner = RepositoryMiner(token = os.getenv('GITHUB_ACCESS_TOKEN'),
path_to_repo='path/to/cloned/repository',
branch='development', # Optional (default='master')
owner='radon-h2020', # Optional (default=None)
repo='radon-iac-miner') # Optional (default=None)
# Mine repository as previous example ...
Command-line usage
usage: iac-miner [-h] [-v] {mine-github,mine-repository} ...
A Python library to crawl GitHub for Infrastructure-as-Code based repositories
and minethose repositories to identify fixing commits and label defect-prone
files.
positional arguments:
{mine-github,mine-repository}
mine-github Mine repositories from Github
mine-repository Mine a single repository
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
iac-miner mine-github
usage: iac-miner mine-github [-h] [--from DATE_FROM] [--to DATE_TO]
[--pushed-after DATE_PUSH]
[--iac-languages [{ansible,chef,puppet,all} [{ansible,chef,puppet,all} ...]]]
[--include-fork] [--min-issues MIN_ISSUES]
[--min-releases MIN_RELEASES]
[--min-stars MIN_STARS]
[--min-watchers MIN_WATCHERS]
[--primary-language PRIMARY_LANGUAGE] [--verbose]
dest tmp_clones_folder
positional arguments:
dest destination folder to save results
tmp_clones_folder path to temporary clone the repositories for the
analysis
optional arguments:
-h, --help show this help message and exit
--from DATE_FROM start searching from this date (default: 2014-01-01
00:00:00)
--to DATE_TO search up to this date (default: 2020-01-01 00:00:00)
--pushed-after DATE_PUSH
search up to this date (default: 2019-01-01 00:00:00)
--iac-languages [{ansible,chef,puppet,all} [{ansible,chef,puppet,all} ...]]
only repositories with this language(s) will be
analyzed (default: all)
--include-fork whether to include forked repositories (default:
False)
--min-issues MIN_ISSUES
minimum number of issues (default: 0)
--min-releases MIN_RELEASES
minimum number of releases (default: 0)
--min-stars MIN_STARS
minimum number of stars (default: 0)
--min-watchers MIN_WATCHERS
minimum number of watchers (default: 0)
--primary-language PRIMARY_LANGUAGE
the primary language of the repository (default: None)
--verbose whether to output results (default: False)
iac-miner mine-repository
usage: iac-miner mine-repository [-h] [--branch REPO_OWNER]
[--owner REPO_OWNER] [--name REPO_NAME]
[--verbose]
path_to_repo dest
positional arguments:
path_to_repo Name of the repository (owner/name).
dest Destination folder to save results.
optional arguments:
-h, --help show this help message and exit
--branch REPO_OWNER the repository's default branch (default: master)
--owner REPO_OWNER the repository's owner (default: None)
--name REPO_NAME the repository's name (default: None)
--verbose whether to output results (default: False)
Current release
[0.1.3]
- The mine-repository option is now supported
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file iacminer-0.1.3.tar.gz.
File metadata
- Download URL: iacminer-0.1.3.tar.gz
- Upload date:
- Size: 20.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
908af7bf6078a8cd6f60201bae7a3662ed31546c0c87a611f3011c86549d9648
|
|
| MD5 |
b53d9d6ee3e1166bf8d1274e917abaaa
|
|
| BLAKE2b-256 |
03911b0f12ce150f6b933de1a46adc0aa6cd34e76582232b008da96ededde59d
|