setlr is a tool for Semantic Extraction, Transformation, and Loading.
Project description
setlr: Semantic Extract, Transform and Load
SETLr is a powerful Python tool for generating RDF graphs from tabular data using declarative SETL (Semantic Extract, Transform, Load) scripts.
Features
✨ Multiple Data Sources: CSV, Excel, JSON, XML, RDF, SAS files
🔄 Flexible Transformations: JSON-LD templates with Jinja2, Python functions, SPARQL
⚡ High Performance: Streaming XML parsing, pandas DataFrames, progress tracking
🐍 Python Integration: Use as library or CLI tool
✅ Validation: Built-in SHACL validation
📝 Well Documented: Comprehensive guides and API reference
Quick Start
Installation
pip install setlr
Simple Example
Create data.csv:
ID,Name,Email
1,Alice,alice@example.com
2,Bob,bob@example.com
Create transform.setl.ttl:
@prefix setl: <http://purl.org/twc/vocab/setl/> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix csvw: <http://www.w3.org/ns/csvw#> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix : <http://example.com/> .
:table a csvw:Table, setl:Table ;
prov:wasGeneratedBy [ a setl:Extract ; prov:used <data.csv> ] .
:output a void:Dataset ;
prov:wasGeneratedBy [
a setl:Transform, setl:JSLDT ;
prov:used :table ;
prov:value '''[{
"@id": "http://example.com/person/{{row.ID}}",
"@type": "http://xmlns.com/foaf/0.1/Person",
"http://xmlns.com/foaf/0.1/name": "{{row.Name}}",
"http://xmlns.com/foaf/0.1/mbox": "mailto:{{row.Email}}"
}]'''
] .
Run SETLr:
setlr transform.setl.ttl
Using from Python
from rdflib import Graph, URIRef
import setlr
# Load SETL script
setl_graph = Graph()
setl_graph.parse("transform.setl.ttl", format="turtle")
# Execute ETL pipeline
resources = setlr.run_setl(setl_graph)
# Access generated RDF
output = resources[URIRef('http://example.com/output')]
print(f"Generated {len(output)} RDF triples")
Documentation
📚 Complete Documentation - Full guides and references
Quick Links:
- Tutorial - Step-by-step guide to SETLr
- JSLDT Template Language - Transform syntax reference
- Python API - Using SETLr from Python
- Quick Start - Get started in 5 minutes
- Examples - Real-world examples
Advanced Topics:
- Streaming XML with XPath - Efficient large file processing
- Python Functions - Custom Python transforms
- SPARQL Support - Query and update endpoints
- SHACL Validation - Validate your RDF output
Key Concepts
SETLr uses RDF (with PROV-O vocabulary) to describe ETL workflows:
- Extract: Load data from sources (CSV, Excel, JSON, XML, RDF, SAS)
- Transform: Apply templates or Python scripts to generate RDF
- Load: Save to files or SPARQL endpoints
Supported Formats
Input:
- Tabular: CSV, TSV, Excel (XLS/XLSX), SAS (XPORT/SAS7BDAT)
- Structured: JSON (with ijson selectors), XML (with XPath streaming)
- Semantic: RDF (Turtle, JSON-LD, RDF/XML, etc.), OWL Ontologies
Output:
- RDF: Turtle, TriG, N-Triples, N3, RDF/XML, JSON-LD
- Destinations: Files, SPARQL Update endpoints
Examples
See the examples/ directory for complete working examples:
social.setl.ttl- Basic CSV to RDF with conditionals and loopsontology.setl.ttl- OWL ontology transformation with SHACL shapes
Development
# Clone repository
git clone https://github.com/tetherless-world/setlr.git
cd setlr
# Bootstrap (creates venv and installs dependencies)
./script/bootstrap
# Activate virtual environment
source venv/bin/activate
# Run tests
./script/build
# Run linter
flake8 setlr/
Contributing
Contributions are welcome! Please see our Contributing Guide for details on:
- Development setup and workflow
- Code standards and style guidelines
- Testing requirements
- Pull request process
Please note that this project follows a Code of Conduct. By participating, you are expected to uphold this code.
License
Apache License 2.0 - see LICENSE file for details.
Citation
If you use SETLr in your research, please cite:
@software{setlr,
title = {SETLr: Semantic Extract, Transform and Load},
author = {McCusker, Jamie},
year = {2024},
url = {https://github.com/tetherless-world/setlr}
}
Support
- 📖 Documentation
- 🐛 Issue Tracker
- 💬 Discussions
- 🔒 Security Policy - Report security vulnerabilities
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file setlr-1.0.3.tar.gz.
File metadata
- Download URL: setlr-1.0.3.tar.gz
- Upload date:
- Size: 39.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fbfe1dfa995309bd819fc930e3e232c83079fc37e99de29939e0c58276c25a6a
|
|
| MD5 |
76a01cdb2505a5f9cb4718cc98fcd357
|
|
| BLAKE2b-256 |
2c96f56d641666fc3e4c487624ae5c1d6acefae70afb400be1a5d117f7d74404
|
File details
Details for the file setlr-1.0.3-py3-none-any.whl.
File metadata
- Download URL: setlr-1.0.3-py3-none-any.whl
- Upload date:
- Size: 26.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
441de8b445819c2a2f96b043b861216756ea3e4f768f05a154b1e0b0714984dd
|
|
| MD5 |
1a54e6d2f644c99f283c854fd7769b2b
|
|
| BLAKE2b-256 |
6295c407ac38abccbdcd389493985bc1156651113759468d10348102a07f1ccd
|