A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models
Project description
SCHEMA-MINERpro: Agentic AI for Ontology Grounding over LLM-Discovered Scientific Schemas in a Human-in-the-Loop Workflow
Schema-Miner Pro is an open-source framework for scientific schema mining and ontology grounding. It combines Large Language Models (LLMs) with human-in-the-loop refinement to extract and organize schema properties from unstructured text, and extends this process with an automated ontology-grounding component. Documentation and usage guides are available at schema-miner.readthedocs.io.
🧪 Installation
Install the package directly from PyPI using pip:
pip install schema-miner
If you are working with the source code directly, install dependencies from requirements.txt:
git clone https://github.com/sciknoworg/schema-miner.git
cd schema-miner
pip install -r requirements.txt
⚙️ System Requirements
Running with OpenAI models (e.g., GPT-4o, GPT-4-turbo) requires no special hardware beyond a basic system with internet access, since inference is API-based. For open-source models (e.g., Llama 3.1 8B), local execution is possible on CPU but slow; for practical performance, a GPU with sufficient VRAM (per model specifications) is strongly recommended.
For more details, please check the documentation: https://schema-miner.readthedocs.io/en/latest/.
🚀 Quick Start
For a quick start, see the provided example notebooks highlighting the overall workflows of the schema-miner.
📚 Citing this Work
If you use this repository in your research or applications, please cite the following paper(s):
-
LLMs4SchemaDiscovery: A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models
Sameer Sadruddin, Jennifer D’Souza, Eleni Poupaki, Alex Watkins, Hamed Babaei Giglou, Anisa Rula, Bora Karasulu, Sören Auer, Adrie Mackus, and Erwin Kessels. LLMs4SchemaDiscovery: A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models. In The Semantic Web – ESWC 2025, Springer, Cham, pp. 244–261. https://doi.org/10.1007/978-3-031-94578-6_14
📌 BibTeX
@InProceedings{10.1007/978-3-031-94578-6_14, author = {Sadruddin, Sameer and D'Souza, Jennifer and Poupaki, Eleni and Watkins, Alex and Babaei Giglou, Hamed and Rula, Anisa and Karasulu, Bora and Auer, S{\"o}ren and Mackus, Adrie and Kessels, Erwin}, editor = {Curry, Edward and Acosta, Maribel and Poveda-Villal{\'o}n, Maria and van Erp, Marieke and Ojo, Adegboyega and Hose, Katja and Shimizu, Cogan and Lisena, Pasquale}, title = {LLMs4SchemaDiscovery: A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models}, booktitle = {The Semantic Web}, year = {2025}, publisher = {Springer Nature Switzerland}, address = {Cham}, pages = {244--261}, isbn = {978-3-031-94578-6}, }
-
SCHEMA-MINERpro: Agentic AI for Ontology Grounding over LLM-Discovered Scientific Schemas in a Human-in-the-Loop Workflow
Sameer Sadruddin, Jennifer D’Souza, Eleni Poupaki, Alex Watkins, Bora Karasulu, Sören Auer, Adrie Mackus, and Erwin Kessels. SCHEMA-MINERpro: Agentic AI for Ontology Grounding over LLM-Discovered Scientific Schemas in a Human-in-the-Loop Workflow. In Semantic Web Journal. https://www.semantic-web-journal.net/system/files/swj3871.pdf
📌 BibTeX
@InProceedings{10.1007/978-3-031-94578-6_14, author = {Sadruddin, Sameer and D'Souza, Jennifer and Poupaki, Eleni and Watkins, Alex and Karasulu, Bora and Auer, S{\"o}ren and Mackus, Adrie and Kessels, Erwin}, title = {SCHEMA-MINERpro: Agentic AI for Ontology Grounding over LLM-Discovered Scientific Schemas in a Human-in-the-Loop Workflow}, journal = {Semantic Web Journal}, year = {2025}, }
👥 Contact & Contributions
We’d love to hear from you!
Whether you're interested in collaborating on Schema-MinerPro or have ideas to extend its capabilities, feel free to reach out:
-
Collaboration inquiries: Contact Jennifer D'Souza at jennifer.dsouza [at] tib.eu
-
Development questions or bug reports: Please open an issue right here in the repository or get in touch with the lead developer Sameer Sadruddin at sameer.sadruddin [at] tib.eu
Let’s build better schema-mining tools—together!
📃 License
This work is licensed under a MIT License
🔗 Links
Source Code: https://github.com/sciknoworg/schema-miner
Documentation: https://schema-miner.readthedocs.io/en/latest/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file schema_miner-3.2.5.tar.gz.
File metadata
- Download URL: schema_miner-3.2.5.tar.gz
- Upload date:
- Size: 42.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
39756febed729341f09316bf0903fb4243edf237f130276fbaacf24afde8870e
|
|
| MD5 |
4ec8980e68b648111e210ae14c33e3c2
|
|
| BLAKE2b-256 |
2bb6263cb5c673c2ff6326a5592937bd33a5588afff0fe63c7e085c9fa12428c
|
File details
Details for the file schema_miner-3.2.5-py3-none-any.whl.
File metadata
- Download URL: schema_miner-3.2.5-py3-none-any.whl
- Upload date:
- Size: 54.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
726d90e98b017586c53fba6a17c54024abc8b6f9c824993d921bb5f29dcced4d
|
|
| MD5 |
5305588dadb01c1b88e3f17a80970ca3
|
|
| BLAKE2b-256 |
e39d003e78828e8d4965bb6948d26b3af18e10dec064a6f9030037c54553d18c
|