Extract structured metadata from git repositories.
Project description
Gimie
Gimie (GIt Meta Information Extractor) is a python library and command line tool to extract structured metadata from git repositories.
:warning: Gimie is at an early development stage. It is not yet functional.
Context
Scientific code repositories contain valuable metadata which can be used to enrich existing catalogues, platforms or databases. This tool aims to easily extract structured metadata from a generic git repositories. The following sources of information are used:
- Github API
- Gitlab API
- Local Git metadata
- License text
- Free text in README
- Renku project metadata
Installation
To install the dev version from github:
pip install git+https://github.com/SDSC-ORD/gimie.git#egg=gimie
Usage
As a command line tool:
gimie data https://github.com/numpy/numpy
As a python library:
from gimie.project import Project
proj = Project("https://github.com/numpy/numpy)
# To retrieve the rdflib.Graph object
g = proj.to_graph()
# To retrieve the serialized graph
proj.serialize(format='ttl')
Or to extract only from a specific source:
from gimie.sources.remote import GithubExtractor
gh = GithubExtractor('https://github.com/SDSC-ORD/gimie')
gh.extract()
# To retrieve the rdflib.Graph object
g = gh.to_graph()
# To retrieve the serialized graph
gh.serialize(format='ttl')
Outputs
The default output is JSON-ld, a JSON serialization of the RDF data model. We follow the schema recommended by codemeta. Supported formats are json-ld, turtle and n-triples.
Contributing
All contributions are welcome. New functions and classes should have associated tests and docstrings following the numpy style guide.
The code formatting standard we use is black, with --line-length=79
to follow PEP8 recommendations. We use pytest as our testing framework. This project uses pyproject.toml to define package information, requirements and tooling configuration.
For local development, you can clone the repository and install the package in editable mode, either using pip:
git clone https://github.com/SDSC-ORD/gimie && cd gimie
pip install -e .
Or poetry, to work in an isolated virtual environment:
git clone https://github.com/SDSC-ORD/gimie && cd gimie
poetry install
Releases and Publishing on Pypi
Releases are done via github release
- a release will trigger a github workflow to publish the package on Pypi
- Make sure to update to a new version in
pyproject.toml
before making the release - It is possible to test the publishing on Pypi.test by running a manual workflow: go to github actions and run the Workflow: 'Publish on Pypi Test'
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.