CLI and library for extracting maSMP/CODEMETA metadata (and sources) from code repositories
Project description
comet-rs
CLI and Python library for extracting maSMP / CODEMETA metadata (plus per‑property sources and confidence) from GitHub and GitLab repositories.
Given a repository URL, comet-rs:
- Calls the platform API (GitHub / GitLab)
- Parses files like
CITATION.cff,LICENSE, andREADME.md - Optionally enriches with external services (OpenAlex, archives)
- Builds a maSMP or CODEMETA JSON‑LD document
- Tracks, for each property, which source set it and with what confidence
Installation
pip install comet-rs
Python 3.10+ is required.
CLI usage
Extract full metadata
comet-rs extract https://github.com/zbmed-semtec/maSMP-metadata-extraction maSMP --with-enrichment
Outputs JSON with:
schema:maSMPorCODEMETAcode_url: repository URLresults: JSON‑LD documentenriched_metadata: per‑property source / confidence / category (for maSMP)
Extract a single property (value + source)
comet-rs extract_property https://github.com/zbmed-semtec/maSMP-metadata-extraction author
Example output:
{
"property_name": "author",
"property_value": [
{
"@type": "Person",
"familyName": "",
"givenName": "Daniel",
"@id": "https://orcid.org/0000-0003-0454-7145"
}
],
"source": "citation_cff",
"confidence": 0.93
}
By default, extract_property uses the maSMP schema. To use CODEMETA:
comet-rs extract_property https://github.com/owner/repo name --schema CODEMETA
Compute a FAIRness assessment
comet-rs fairness https://github.com/zbmed-semtec/maSMP-metadata-extraction maSMP
Outputs JSON with:
schema:maSMPorCODEMETAcode_url: repository URLresults: JSON‑LD document used for the assessmentfairness: full FAIRness report (overall score, per‑principle scores, and indicator details)
Authentication & rate limits
For public repositories you can often run without a token, but GitHub and GitLab apply rate limits. For heavier use or private repos, set:
export GITHUB_TOKEN=ghp_... # for github.com URLs
export GITLAB_TOKEN=glpat_... # for gitlab.com URLs
comet-rs automatically picks the right token based on the repository URL, or you can pass --token explicitly:
comet-rs extract https://gitlab.com/owner/repo maSMP --token glpat_...
Tokens only need minimal read scopes (repo / read:org on GitHub, read_api / read_repository on GitLab).
Python API
You can also call the extractor directly from Python using the comet_rs package.
Full extraction
import os
import comet_rs
jsonld_document, enriched = comet_rs.extract_metadata(
"https://github.com/zbmed-semtec/maSMP-metadata-extraction",
schema="maSMP", # or "CODEMETA"
token=os.getenv("GITHUB_TOKEN"), # or GITLAB_TOKEN for GitLab
with_enrichment=True, # False for JSON‑LD only
)
# jsonld_document: maSMP/CODEMETA JSON‑LD (dict)
# enriched: per‑property source/confidence/category (or None)
Extract a single property in Python
import comet_rs
extracted_at, matches = comet_rs.extract_property(
"https://github.com/zbmed-semtec/maSMP-metadata-extraction",
"author", # JSON-LD key or entity field name
schema="maSMP", # or "CODEMETA"
token=os.getenv("GITHUB_TOKEN"),
)
for match in matches:
print("Profile:", match["profile"])
print("Value:", match["value"])
print("Source:", match.get("source"))
print("Confidence:", match.get("confidence"))
FAIRness assessment in Python
import os
import comet_rs
jsonld_document, fairness_report = comet_rs.assess_fairness(
"https://github.com/zbmed-semtec/maSMP-metadata-extraction",
schema="maSMP", # or "CODEMETA"
token=os.getenv("GITHUB_TOKEN"),
)
print("Overall score:", fairness_report.overall_score)
print("Findable score:", fairness_report.findable.score)
print("Accessible score:", fairness_report.accessible.score)
print("Interoperable score:", fairness_report.interoperable.score)
print("Reusable score:", fairness_report.reusable.score)
Project links & docs
- Source code: GitHub / GitLab repository where
comet-rsis developed - Backend architecture and development docs:
README.mdin the repo root (architecture & local FastAPI server)docs/DEVELOPER_GUIDE.mddocs/ADDING_NEW_PLATFORM.md
Use those documents if you want to contribute, run the FastAPI backend locally, or add support for new code hosting platforms.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file comet_rs-0.1.1.tar.gz.
File metadata
- Download URL: comet_rs-0.1.1.tar.gz
- Upload date:
- Size: 56.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37f266c0748a3d5c2822a49e3b46762f08637c59acd4c0e0c494d5709c5963ba
|
|
| MD5 |
9e84c38f72cf6fc1e9274ed75043750e
|
|
| BLAKE2b-256 |
15460ac312ad18338646f4b6e26c1aabdd11cfb5ba82700d0e0fa3fb8b7392ba
|
File details
Details for the file comet_rs-0.1.1-py3-none-any.whl.
File metadata
- Download URL: comet_rs-0.1.1-py3-none-any.whl
- Upload date:
- Size: 60.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
14408a9f2294a43302f5a4927ef0240d779624f22d9b444fb9e223191478cc82
|
|
| MD5 |
27daa7cf4b308de0ee4b7b867d210b3b
|
|
| BLAKE2b-256 |
adcefd77f6098be5e7f4075e59afcdfe59ccd199e3ae60ce55d0d47803519597
|