Skip to main content

Extract structured metadata from git repositories.

Project description

Gimie

Gimie (GIt Meta Information Extractor) is a python library and command line tool to extract structured metadata from git repositories.

Context

Scientific code repositories contain valuable metadata which can be used to enrich existing catalogues, platforms or databases. This tool aims to easily extract structured metadata from a generic git repositories. The following sources of information are used:

  • Git metadata
  • Filenames
  • License
  • HTML in web page
  • Freetext content in README and other files

Installation

To install the dev version from github:

pip install git+https://github.com/SDSC-ORD/gimie.git#egg=gimie

Usage

As a command line tool:

gimie https://github.com/numpy/numpy

As a python library:

import gimie
repo = gimie.Repo("https://github.com/numpy/nump)

Outputs

The default output is JSON-ld, a JSON serialization of the RDF data model. We follow the schema recommended by codemeta.

Contributing

All contributions are welcome. New functions and classes should have associated tests and docstrings following the numpy style guide.

The code formatting standard we use is black, with --line-length=79 to follow PEP8 recommendations. We use pytest as our testing framework. This project uses pyproject.toml to define package information, requirements and tooling configuration.

For local development, you can clone the repository and install the package in editable mode, either using pip:

git clone https://github.com/SDSC-ORD/gimie && cd gimie
pip install -e .

Or poetry, to work in an isolated virtual environment:

git clone https://github.com/SDSC-ORD/gimie && cd gimie
poetry install

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gimie-0.1.0.tar.gz (8.9 kB view hashes)

Uploaded Source

Built Distribution

gimie-0.1.0-py3-none-any.whl (12.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page