RepoSherlock facilitates data retrieval from some repository management services.
RepoSherlock is a package intended to facilitate retrieving information from software repositories hosted by services such as GitHub, GitLab, BitBucket, or similar services. It uses their APIs to fetch issue, pull/merge request, and commit data for further processing.
I developed this primarily for researchers interested in studying software repositories, since it helps me with my work too. While it was originally meant for research purposes, it is conceivable that it would have other applications as well. As such, this project is licensed under the MIT license.
So far, RepoSherlock supports data extraction from repositories hosted on GitHub. I have plans to port my previous BitBucket client to this project too, and eventually, add a GitLab client as well. Stay up-to-date by monitoring RepoSherlock’s issues page.
RepoSherlock supports two modes of use: - as a standalone application, and - as a module within your python script.
Once installed, you can use RepoSherlock as a standalone application in your terminal of choice. It supports the following arguments:
- -h, –help: Shows a help message and exits.
- –user <username>: Your username on the service from which you want to retrieve data.
- –token <token>: The token provided to you by your repository service.
- –target <owner/repository>: The repository whose data you want to pull.
- –type [GitHub|BitBucket]: Your repository management service. Currently, only GitHub is supported.
- –outdir <path/to/output/directory>: The output directory where RepoSherlock should save the queried data.
- –pages <number_of_pages>: The maximum number of pages of data to fetch. Default is 1000. Naturally, RepoSherlock will stop once no more data is available.
A typical command using RepoSherlock looks like the following example:
$ reposherlock --user omazhary \ --token <long_alphanumeric_token> \ --target omazhary/reposherlock \ --type GitHub \ --outdir output/.
You can build a python script and use RepoSherlock within it to fetch data on the fly to do with as you please. For instance, if you wanted to create a GitHub client to use in your python script, you would import it as a dependency, and give it the necessary information:
from reposherlock.github import GitHub client = GitHub('omazhary', 'my_long_alphanumeric_token') issues = client.get_issues('omazhary/reposherlock', 1000) pull_requests = client.get_pull_requests('omazhary/reposherlock', 1000) commits = client.get_commits('omazhary/reposherlock', 1000)
Further documentation can be found here.
Regardless of whether you want to use it as a standalone application or a module in your project, you can install RepoSherlock via pip as you would any normal python module:
$ pip install reposherlock
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size reposherlock-0.1.5.tar.gz (8.3 kB)||File type Source||Python version None||Upload date||Hashes View|