Skip to main content

Extract provenance information (W3C PROV) from GitLab projects.

Project description

:seedling: gitlab2prov: Extract Provenance from GitLab Projects

License: MIT made-with-python PyPI version fury.io DOI Open in Visual Studio Code

Git commits (by Cauldron.io) Issues created (by Cauldron.io) Issues closed (by Cauldron.io)

gitlab2prov is a Python library and command line tool for extracting provenance information from GitLab projects.

The data model employed by gitlab2prov has been modelled according to W3C PROV PROV specification.
A representation of the model can be found in /docs.

Installation :wrench:

Clone the project and use the provided setup.py to install gitlab2prov.

python setup.py install --user

Usage :computer:

gitlab2prov can be used either as a command line script or as a Python lib.

To extract provenance from a project, follow these steps:

Instructions Config Option
1. Obtain an API Token for the GitLab API (Token Guide) --token
2. Set the URL[s] for the GitLab Project[s] --project_urls
3. Choose a PROV serialization format --format

As a Command Line Script

gitlab2prov can be configured either by command line flags or by using a config file.

Config File :clipboard:

An example of a configuration file can be found in /config.

[GITLAB]
project_urls = project_a_url, project_b_url
token = token

[OUTPUT]
format = json

[MISC]
profile = False
verbose = False
pseudonymous = False
double_agents = path/to/alias/mapping
Command Line Flags :flags:
usage: gitlab2prov [-h] -p PROJECT_URLS [PROJECT_URLS ...] -t TOKEN [-c CONFIG_FILE] [-f {json,rdf,xml,provn,dot}] [-v] [--double-agents DOUBLE_AGENTS] [--pseudonymous] [--profile]

Extract provenance information from GitLab projects.

options:
  -h, --help            show this help message and exit
  -p PROJECT_URLS [PROJECT_URLS ...], --project-urls PROJECT_URLS [PROJECT_URLS ...]
                        gitlab project urls
  -t TOKEN, --token TOKEN
                        gitlab api access token
  -c CONFIG_FILE, --config-file CONFIG_FILE
                        config file path
  -f {json,rdf,xml,provn,dot}, --format {json,rdf,xml,provn,dot}
                        provenance serialization format
  -v, --verbose         write log to stderr, set log level to DEBUG
  --double-agents DOUBLE_AGENTS
                        agent mapping file path
  --pseudonymous        pseudonymize user names by enumeration
  --profile             enable deterministic profiling, write profile to 'gitlab2prov-run-$TIMESTAMP.profile' where $TIMESTAMP is the current timestamp in 'YYYY-MM-DD-hh-mm-ss' format

Provenance Output Formats

gitlab2prov supports output formats that the prov library provides:

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

How to cite

If you use GitLab2PROV in a scientific publication, we would appreciate citations to the following paper:

Bibtex entry:

@InProceedings{SchreiberBoerKurnatowski2021,
  author    = {Andreas Schreiber and Claas de~Boer and Lynn von~Kurnatowski},
  booktitle = {13th International Workshop on Theory and Practice of Provenance (TaPP 2021)},
  title     = {{GitLab2PROV}{\textemdash}Provenance of Software Projects hosted on GitLab},
  year      = {2021},
  month     = jul,
  publisher = {{USENIX} Association},
  url       = {https://www.usenix.org/conference/tapp2021/presentation/schreiber},
}

You can also cite specific releases published on Zenodo: DOI

References

Influencial Software for gitlab2prov

  • Martin Stoffers: "Gitlab2Graph", v1.0.0, October 13. 2019, GitHub Link, DOI 10.5281/zenodo.3469385

  • Quentin Pradet: "How do you rate limit calls with aiohttp?", GitHub Gist, MIT LICENSE

Influencial Papers for gitlab2prov:

Papers that refer to gitlab2prov:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gitlab2prov-1.0.tar.gz (31.1 kB view hashes)

Uploaded Source

Built Distribution

gitlab2prov-1.0-py3-none-any.whl (34.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page