Skip to main content

HuggingFace Parser converting onnx repos to rdf

Project description

Hugging Face ONNX to RDF parser

License DOI Cite this software

A Python tool uses ONNX2RDF for converting HuggingFace Repositories with ONNX (Open Neural Network Exchange) files to RDF (Resource Description Framework).


✨ Features

  • ✅ Parses ONNX model structure into RDF triples (nquads, turtle, trig, trix, jsonld, hdt)
  • ✅ Automatic downloading of HuggingFace Files
  • ✅ Keeps progress making it able to be stopped to later continue
  • ✅ Download repositories randomfly, by most downloads or least downloads
  • ✅ Use of threads and multiproccesing for paralization

⚙️ System Requirements

To use this tool successfully, the following components must be installed on your system:

  • Python 3.13+
  • Java 17+ (OpenJDK recommended)
    Used to run the internal RML Mapper JAR
  • Node.js + npm
  • @rmlio/yarrrml-parser
    Installed globally via npm install -g @rmlio/yarrrml-parser
  • onnx-2-rdf (Pip)

You can also use the dockerfile (see below) which will prepare the enviroment.


📦 Installation Options

Option 1: Build from Source

Clone and install with PEP 621-compliant pyproject.toml:

git clone https://github.com/JorgeMIng/HuggingFace_ONNX2RDF.git
cd ONNX2RDF
pip install .

This installs the CLI command onnx-parser.


Option 2: Using Docker

git clone https://github.com/JorgeMIng/HuggingFace_ONNX2RDF.git
cd ONNX2RDF
docker build -t onnx2rdf .
docker run -it onnx2rdf

This Docker image includes:

  • Python 3.13
  • OpenJDK 17
  • Node.js + npm
  • @rmlio/yarrrml-parser
  • ONNX2RDF + CLI (onnx-parser)
  • HuggingFace2RDF + CLI (hugg-parser)

🚀 Usage

Command-Line Interface

hugg-parser num_repos [OPTIONS]

Positional Argument

Argument Description
num_repos Number of repos to parser from the list (all repos with ONNX tag) A value with (-1) means all repos.

Main Options

Option Description
--target_path Output directory for RDF files (default: rdfs). Can be absolute or relative.
--rdf_format RDF serialization format: nquads (default), turtle, trig, trix, jsonld, hdt.

Pipeline Control

Option Description
--work_folder Changes the relative base folder for input/output (models, logs, RDF). Default to folder where the software is being called from
--num_threads Number of threads to use (Default -1)

List Control

Progress list (progress_cache.json) stores repositories already done without error, repositories not able to be done (repo_id_error), with warnings (repo_id_warning), not done as it was stopped (repos_stopped)-> this repositories will have priority when executing again the program. and (repo_id_try_again) Has repositories which fail only because ONNX2RDF + repositories with warning. Repo_id_error are errors with Huggiface or unexpected errors

Option Description
--order_method Order method for selecting the models for the list (random (default), m_downs,l_downs)
--try_again When try_again is executed only model on the list (repo_id_try_again) would be executed
--try_error When try_error is executed only model on the list (repo_id_error) would be executed

Config Controls

A config file is present for changing values used for ONNX2RDF, there is a default config file, if a custom_parser.config is created the new values would be used to replace the final config set. The first time executing the program the file is created if it doenst exists in the workfolder.

  • URIS is the most important category
Type Option Description
----------- ---------------------- -------------
URIS resource_url Namespace used for ONNX2RDF ahd HuggParser for modles :(http://base.huggingface.model.com/resource/)
URIS hugginface_base Namespace for Huggparser Ontology (yarrml file):(http://base.huggingface.model.com#)
----------- ---------------------- --------------
PARSER debug ONNX2RDF debug mode (True/False)
PARSER cache ONNX2RDF cache element o list of elements Valid Values: (all, load-model, pre-process, yamml2rml,mapping)
PARSER work_folder ONNX2RDF workfolder
PARSER to_console Activate ONNX2RDF console (True/False)
PARSER models_folder Folder for searching Huggiface files (Do not change)
PARSER max_ram RAM allocated for ONNX2RDF
PARSER no_parsing Desactivate Parsing (False) (Do not change)
----------- ---------------------- --------------
METRICS file_name Name of the metrics file to create
----------- ---------------------- --------------
CACHE progress_file Name of progress cache file
CACHE cache_hugg_list Huggiface model list cache file (file with all the ids, and metadata)
----------- ---------------------- --------------
METADATA tmp_metadata_folder Folder used to create metadata tmp files
METADATA mapping_path Folder to look up for the metadata yarrrml (looks up on instalation path) Not yet able to give custom yarrrml
METADATA mapping_file Name of metadata yarrrml (looks up on instalation path) Not yet able to give custom yarrrml
----------- ---------------------- --------------
LOGS logs_folder Folder for logs
LOGS to_console Activate HuggingFace2RDF console (True/False)

Example

hugg-parser 100 \
  --rdf_format turtle \
  --num_threads 4
  --order_method m_downs

This will:

  • Parse the 100 most downloads repositories with turtle format
  • Uses 4 threads

⚙️ Advanced Usage as a Library

Besides the command-line interface, ONNX2RDF also works as a Python library for programmatic integration.

Main HuggingFaceParser Class

The core class is HuggingFaceParser.

from HuggParser.HuggingFaceParser import HuggingFaceParser

parser = HuggingFaceParser()

# Set parameters
parser.set_rdf_format("nquads")
parser.set_number_threads(8)
parser.set_work_folder("results")

# Parse huggingface repositories
number_repos:int,try_again=False,try_error=False,order_method="random"
parser.run(
    number_repos=100,
    try_again=False,
    try_error=False,
    order_method="random",
)


## 📚 Related Resources


- [ONNX Format](https://onnx.ai/)  
  The Open Neural Network Exchange (ONNX) standard for representing ML models.

- [RDF Basics](https://www.w3.org/RDF/)  
  Introduction to the Resource Description Framework (RDF) by W3C.

- [SPARQL Tutorial](https://www.w3.org/TR/sparql11-query/)  
  Official SPARQL 1.1 Query Language specification and examples.

- [HuggingFace](https://huggingface.co)  
  Official HuggingFace Website.


## 🛠️ TODO

Improvements and future features can be found on [TODO.md (link)](TODO.md):

Feel free to contribute! Check out the [Issues](https://github.com/JorgeMIng/HuggingFace_ONNX2RDF/issues) tab for current tasks and discussions.

## 📄 License

This project is licensed under the terms of the Apache2.0 license.  
See [LICENSE](LICENSE) for more information.



## 🙌 Acknowledgments

- Built using the [ONNX](https://onnx.ai/) Python API.
- Built using the [HuggingFaceHub](https://github.com/huggingface/huggingface_hub) Official Hub API.
- RML Mapping powered by the [RMLMapper](https://github.com/RMLio/rmlmapper-java).
- YARRRML parsing supported via [@rmlio/yarrrml-parser](https://www.npmjs.com/package/@rmlio/yarrrml-parser).
- Developed by OEG (Ontology Engineering Group) of Polytechnic University of Madrid.



## 📑 Citation

If you use this software, please cite as:
 
```bibtex
@software{martin_izquierdo_2025_onnx2rdf,
  author       = {Jorge Martín Izquierdo},
  title        = {HuggingFace2RDF},
  version      = {0.1.1},
  date         = {2025-07-05},
  url          = {https://github.com/JorgeMIng/HuggingFace_ONNX2RDF},
  doi          = {10.5281/zenodo.15814658},
  license      = {Apache 2.0},
  affiliation  = {Universidad Politécnica de Madrid},
  keywords     = {ONNX, RDF, Semantic Web, Machine Learning},
  orcid        = {https://orcid.org/0009-0005-7696-8995}
}

📫 Contact

For questions, feedback, or contributions, feel free to reach out:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

huggingface2rdf-0.1.1.tar.gz (29.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

huggingface2rdf-0.1.1-py3-none-any.whl (27.3 kB view details)

Uploaded Python 3

File details

Details for the file huggingface2rdf-0.1.1.tar.gz.

File metadata

  • Download URL: huggingface2rdf-0.1.1.tar.gz
  • Upload date:
  • Size: 29.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for huggingface2rdf-0.1.1.tar.gz
Algorithm Hash digest
SHA256 71d9bdcda48886108abeccb76d92e9d9f8094803186e1b4fc1f419d1b1a1283f
MD5 65f9033fd689f338fe8fbcf408e7cd36
BLAKE2b-256 dde616cf4ca7672c215d23afdb60af2aa734f9677ba5528e410c37957b9dfee5

See more details on using hashes here.

File details

Details for the file huggingface2rdf-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for huggingface2rdf-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d33d9db64e9930c0771af3a594e3ebda63cf4e6eeeea27e25faf608a76543e7a
MD5 342b52386d8a5f1c706dea9a5d1b6bdf
BLAKE2b-256 a7184ac34ee649834eadc28580896bfb7718174c17e41352e912294d822af2d1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page