Skip to main content

Extract structured metadata from git repositories.

Project description

gimie

PyPI version Python Poetry Test docs Coverage Status

Gimie (GIt Meta Information Extractor) is a python library and command line tool to extract structured metadata from git repositories.

Context

Scientific code repositories contain valuable metadata which can be used to enrich existing catalogues, platforms or databases. This tool aims to easily extract structured metadata from a generic git repositories. It can extract extract metadata from the Git provider (GitHub or GitLab) or from the git index itself.


Using Gimie: easy peasy, it's a 3 step process.

1: Installation

To install the stable version on PyPI:

pip install gimie

To install the dev version from github:

pip install git+https://github.com/sdsc-ordes/gimie.git@main#egg=gimie

Gimie is also available as a docker container hosted on the Github container registry:

docker pull ghcr.io/sdsc-ordes/gimie:latest

# The access token can be provided as an environment variable
docker run -e GITHUB_TOKEN=$GITHUB_TOKEN ghcr.io/sdsc-ordes/gimie:latest gimie data <repo>

2 : Set your credentials

In order to access the github api, you need to provide a github token with the read:org scope.

A. Create access tokens

New to access tokens? Or don't know how to get your Github / Gitlab token ?

Have no fear, see here for Github tokens and here for Gitlab tokens. (Note: tokens are as precious as passwords! Treat them as such.)

B. Set your access tokens via the Terminal

Gimie will use your access tokens to gather information for you. If you want info about a Github repo, Gimie needs your Github token; if you want info about a Gitlab Project then Gimie needs your Gitlab token.

Add your tokens one by one in your terminal: your Github token:

export GITHUB_TOKEN=

and/or your Gitlab token:

export GITLAB_TOKEN=

3: GIMIE info ! Run Gimie

As a command line tool

gimie data https://github.com/numpy/numpy

(want a Gitlab project instead? Just replace the URL in the command line)

As a python library

from gimie.project import Project
proj = Project("https://github.com/numpy/numpy")

# To retrieve the rdflib.Graph object
g = proj.extract()

# To retrieve the serialized graph
g_in_ttl = g.serialize(format='ttl')
print(g_in_ttl)

For more advanced use see the documentation.

Outputs

The default output is Turtle, a textual syntax for RDF data model. We follow the schema recommended by codemeta. Supported formats are turtle, json-ld and n-triples (by specifying the --format argument in your call i.e. gimie data https://github.com/numpy/numpy --format 'ttl').

With no specifications, Gimie will print results in the terminal. Want to save Gimie output to a file? Add your file path to the end : gimie data https://github.com/numpy/numpy > path_to_output/gimie_output.ttl


Contributing

All contributions are welcome. New functions and classes should have associated tests and docstrings following the numpy style guide.

The code formatting standard we use is black, with --line-length=79 to follow PEP8 recommendations. We use pytest as our testing framework. This project uses pyproject.toml to define package information, requirements and tooling configuration.

For development:

activate a conda or virtual environment with Python 3.8 or higher

git clone https://github.com/sdsc-ordes/gimie && cd gimie
make install

run tests:

make test

run checks:

make check

for an easier use Github/Gitlab APIs, place your access tokens in the .env file: (and don't worry, the .gitignore will ignore them when you push to GitHub)

cp .env.dist .env

build documentation:

make doc

Releases and Publishing on Pypi

Releases are done via github release

  • a release will trigger a github workflow to publish the package on Pypi
  • Make sure to update to a new version in pyproject.toml and conf.py before making the release
  • It is possible to test the publishing on Pypi.test by running a manual workflow: go to github actions and run the Workflow: 'Publish on Pypi Test'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gimie-0.7.2.tar.gz (96.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gimie-0.7.2-py3-none-any.whl (107.2 kB view details)

Uploaded Python 3

File details

Details for the file gimie-0.7.2.tar.gz.

File metadata

  • Download URL: gimie-0.7.2.tar.gz
  • Upload date:
  • Size: 96.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.8 Linux/6.8.0-49-generic

File hashes

Hashes for gimie-0.7.2.tar.gz
Algorithm Hash digest
SHA256 a0f697e0643540785e62261c2afa2fa5c4ed3a8eef6583ccded9f691d122dddd
MD5 f74c58cca365efee917f9399f8722939
BLAKE2b-256 da8264e6dccee6a8c3772d382a7fc3cf04fcb76358eaafd86d163e56fcd26f62

See more details on using hashes here.

File details

Details for the file gimie-0.7.2-py3-none-any.whl.

File metadata

  • Download URL: gimie-0.7.2-py3-none-any.whl
  • Upload date:
  • Size: 107.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.8 Linux/6.8.0-49-generic

File hashes

Hashes for gimie-0.7.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7da9185adebe27b7deee88a6617ae59b2f3b3e7ccf5058900be28a0047e4efe5
MD5 887e3684d8903b326d4b123a1f989f0b
BLAKE2b-256 ea96f3cb8d114d1d1f3c97e762daed5f2851c8048e221da80949986c7047ba86

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page