Skip to main content

CLI and library for extracting maSMP/CODEMETA metadata (and sources) from code repositories

Project description

comet-rs

CLI and Python library for extracting maSMP / CODEMETA metadata (plus per‑property sources and confidence) from GitHub and GitLab repositories.

Given a repository URL, comet-rs:

  • Calls the platform API (GitHub / GitLab)
  • Parses files like CITATION.cff, LICENSE, and README.md
  • Optionally enriches with external services (OpenAlex, archives)
  • Builds a maSMP or CODEMETA JSON‑LD document
  • Tracks, for each property, which source set it and with what confidence

Installation

pip install comet-rs

Python 3.10+ is required.


CLI usage

Extract full metadata

comet-rs extract https://github.com/zbmed-semtec/maSMP-metadata-extraction maSMP --with-enrichment

Outputs JSON with:

  • schema: maSMP or CODEMETA
  • code_url: repository URL
  • results: JSON‑LD document
  • enriched_metadata: per‑property source / confidence / category (for maSMP)

Extract a single property (value + source)

comet-rs extract_property https://github.com/zbmed-semtec/maSMP-metadata-extraction author

Example output:

{
  "property_name": "author",
  "property_value": [
    {
      "@type": "Person",
      "familyName": "",
      "givenName": "Daniel",
      "@id": "https://orcid.org/0000-0003-0454-7145"
    }
  ],
  "source": "citation_cff",
  "confidence": 0.93
}

By default, extract_property uses the maSMP schema. To use CODEMETA:

comet-rs extract_property https://github.com/owner/repo name --schema CODEMETA

Compute a FAIRness assessment

comet-rs fairness https://github.com/zbmed-semtec/maSMP-metadata-extraction maSMP

Outputs JSON with:

  • schema: maSMP or CODEMETA
  • code_url: repository URL
  • results: JSON‑LD document used for the assessment
  • fairness: full FAIRness report (overall score, per‑principle scores, and indicator details)

Authentication & rate limits

For public repositories you can often run without a token, but GitHub and GitLab apply rate limits. For heavier use or private repos, set:

export GITHUB_TOKEN=ghp_...      # for github.com URLs
export GITLAB_TOKEN=glpat_...    # for gitlab.com URLs

comet-rs automatically picks the right token based on the repository URL, or you can pass --token explicitly:

comet-rs extract https://gitlab.com/owner/repo maSMP --token glpat_...

Tokens only need minimal read scopes (repo / read:org on GitHub, read_api / read_repository on GitLab).


Python API

You can also call the extractor directly from Python using the comet_rs package.

Full extraction

import os

import comet_rs

jsonld_document, enriched = comet_rs.extract_metadata(
    "https://github.com/zbmed-semtec/maSMP-metadata-extraction",
    schema="maSMP",                              # or "CODEMETA"
    token=os.getenv("GITHUB_TOKEN"),            # or GITLAB_TOKEN for GitLab
    with_enrichment=True,                       # False for JSON‑LD only
)

# jsonld_document: maSMP/CODEMETA JSON‑LD (dict)
# enriched: per‑property source/confidence/category (or None)

Extract a single property in Python

import comet_rs

extracted_at, matches = comet_rs.extract_property(
    "https://github.com/zbmed-semtec/maSMP-metadata-extraction",
    "author",                     # JSON-LD key or entity field name
    schema="maSMP",               # or "CODEMETA"
    token=os.getenv("GITHUB_TOKEN"),
)

for match in matches:
    print("Profile:", match["profile"])
    print("Value:", match["value"])
    print("Source:", match.get("source"))
    print("Confidence:", match.get("confidence"))

FAIRness assessment in Python

import os

import comet_rs

jsonld_document, fairness_report = comet_rs.assess_fairness(
    "https://github.com/zbmed-semtec/maSMP-metadata-extraction",
    schema="maSMP",               # or "CODEMETA"
    token=os.getenv("GITHUB_TOKEN"),
)

print("Overall score:", fairness_report.overall_score)
print("Findable score:", fairness_report.findable.score)
print("Accessible score:", fairness_report.accessible.score)
print("Interoperable score:", fairness_report.interoperable.score)
print("Reusable score:", fairness_report.reusable.score)

Project links & docs

  • Source code: GitHub / GitLab repository where comet-rs is developed
  • Backend architecture and development docs:
    • README.md in the repo root (architecture & local FastAPI server)
    • docs/DEVELOPER_GUIDE.md
    • docs/ADDING_NEW_PLATFORM.md

Use those documents if you want to contribute, run the FastAPI backend locally, or add support for new code hosting platforms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

comet_rs-0.1.0.tar.gz (21.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

comet_rs-0.1.0-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file comet_rs-0.1.0.tar.gz.

File metadata

  • Download URL: comet_rs-0.1.0.tar.gz
  • Upload date:
  • Size: 21.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for comet_rs-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0fd301816a181525e711424fced9bac383009ac02b84a550a8a50f00c391f466
MD5 29f6f3a0af4ae2152eb9b5a2580bc51d
BLAKE2b-256 7081c88cd63cdd4c7d7a83944fae0406152bd6cb254b679087e71d26cfd8432c

See more details on using hashes here.

File details

Details for the file comet_rs-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: comet_rs-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for comet_rs-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4734f5406cab7d799888f11efd7946bd3ede8ec08f9b61906334e71c12772d94
MD5 f5002e50b305424675a784ca825f85ff
BLAKE2b-256 b5e223b574d00423f322dfbfc8ae0db9dcea2574803c5890c5e441ca65d86766

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page