Extract provenance information (W3C PROV) from GitLab projects.
Project description
:seedling: gitlab2prov
: Extract Provenance from GitLab Projects
gitlab2prov
is a Python library and command line tool for extracting provenance information from GitLab projects.
The data model employed by gitlab2prov
has been modelled according to W3C PROV specification.
A representation of the model can be found in /docs
.
Installation :wrench:
Clone the project and use the provided setup.py
to install gitlab2prov
.
python setup.py install --user
Usage :computer:
gitlab2prov
can be used either as a command line script or as a Python lib.
To extract provenance from a project, follow these steps:
Instructions | Config Option |
---|---|
1. Obtain an API Token for the GitLab API (Token Guide) | --token |
2. Set the URL[s] for the GitLab Project[s] | --project_urls |
3. Choose a PROV serialization format | --format |
As a Command Line Script
gitlab2prov
can be configured either by command line flags or by using a config file.
Config File :clipboard:
An example of a configuration file can be found in /config
.
[GITLAB]
project_urls = project_a_url, project_b_url
token = token
[OUTPUT]
format = json
[MISC]
profile = False
verbose = False
pseudonymous = False
double_agents = path/to/alias/mapping
Command Line Flags :flags:
usage: gitlab2prov [-h] -p PROJECT_URLS [PROJECT_URLS ...] -t TOKEN [-c CONFIG_FILE] [-f {json,rdf,xml,provn,dot}] [-v] [--double-agents DOUBLE_AGENTS] [--pseudonymous] [--profile]
Extract provenance information from GitLab projects.
options:
-h, --help show this help message and exit
-p PROJECT_URLS [PROJECT_URLS ...], --project-urls PROJECT_URLS [PROJECT_URLS ...]
gitlab project urls
-t TOKEN, --token TOKEN
gitlab api access token
-c CONFIG_FILE, --config-file CONFIG_FILE
config file path
-f {json,rdf,xml,provn,dot}, --format {json,rdf,xml,provn,dot}
provenance serialization format
-v, --verbose write log to stderr, set log level to DEBUG
--double-agents DOUBLE_AGENTS
agent mapping file path
--pseudonymous pseudonymize user names by enumeration
--profile enable deterministic profiling, write profile to 'gitlab2prov-run-$TIMESTAMP.profile' where $TIMESTAMP is the current timestamp in 'YYYY-MM-DD-hh-mm-ss' format
Provenance Output Formats
gitlab2prov
supports output formats that the prov
library provides:
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
How to cite
If you use GitLab2PROV in a scientific publication, we would appreciate citations to the following paper:
- Schreiber, A., de Boer, C. and von Kurnatowski, L. (2021). GitLab2PROV—Provenance of Software Projects hosted on GitLab. 13th International Workshop on Theory and Practice of Provenance (TaPP 2021), USENIX Association
Bibtex entry:
@InProceedings{SchreiberBoerKurnatowski2021,
author = {Andreas Schreiber and Claas de~Boer and Lynn von~Kurnatowski},
booktitle = {13th International Workshop on Theory and Practice of Provenance (TaPP 2021)},
title = {{GitLab2PROV}{\textemdash}Provenance of Software Projects hosted on GitLab},
year = {2021},
month = jul,
publisher = {{USENIX} Association},
url = {https://www.usenix.org/conference/tapp2021/presentation/schreiber},
}
You can also cite specific releases published on Zenodo:
References
Influencial Software for gitlab2prov
-
Martin Stoffers: "Gitlab2Graph", v1.0.0, October 13. 2019, GitHub Link, DOI 10.5281/zenodo.3469385
-
Quentin Pradet: "How do you rate limit calls with aiohttp?", GitHub Gist, MIT LICENSE
Influencial Papers for gitlab2prov
:
-
De Nies, T., Magliacane, S., Verborgh, R., Coppens, S., Groth, P., Mannens, E., and Van de Walle, R. (2013). Git2PROV: Exposing Version Control System Content as W3C PROV. In Poster and Demo Proceedings of the 12th International Semantic Web Conference (Vol. 1035, pp. 125–128).
-
Packer, H. S., Chapman, A., and Carr, L. (2019). GitHub2PROV: provenance for supporting software project management. In 11th International Workshop on Theory and Practice of Provenance (TaPP 2019).
Papers that refer to gitlab2prov
:
-
Andreas Schreiber, Claas de Boer (2020). Modelling Knowledge about Software Processes using Provenance Graphs and its Application to Git-based VersionControl Systems. In ICSEW'20: Proceedings of the IEEE/ACM 42nd Conference on Software Engineering Workshops (pp. 358–359).
-
Tim Sonnekalb, Thomas S. Heinze, Lynn von Kurnatowski, Andreas Schreiber, Jesus M. Gonzalez-Barahona, and Heather Packer (2020). Towards automated, provenance-driven security audit for git-based repositories: applied to germany's corona-warn-app: vision paper. In Proceedings of the 3rd ACM SIGSOFT International Workshop on Software Security from Design to Deployment (pp. 15–18).
-
Andreas Schreiber (2020). Visualization of contributions to open-source projects. In Proceedings of the 13th International Symposium on Visual Information Communication and Interaction. ACM, USA.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for gitlab2prov-1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7986464d52b5697b1e603db3195cef4e8db38593d3d1ff25be5bcba3a93f58d8 |
|
MD5 | 5cd5e21b407aa8f3733998bf9735ffa0 |
|
BLAKE2b-256 | 2d7c4d2dd99b8026522aba0a1da22a390bfdaa38338d005179bb813c24140bc8 |