Skip to main content

setlr is a tool for Semantic Extraction, Transformation, and Loading.

Project description

setlr: Semantic Extract, Transform and Load

Unit Tests Lint codecov

SETLr is a powerful Python tool for generating RDF graphs from tabular data using declarative SETL (Semantic Extract, Transform, Load) scripts.

Features

Multiple Data Sources: CSV, Excel, JSON, XML, RDF, SAS files
🔄 Flexible Transformations: JSON-LD templates with Jinja2, Python functions, SPARQL
High Performance: Streaming XML parsing, pandas DataFrames, progress tracking
🐍 Python Integration: Use as library or CLI tool
Validation: Built-in SHACL validation
📝 Well Documented: Comprehensive guides and API reference

Quick Start

Installation

pip install setlr

Simple Example

Create data.csv:

ID,Name,Email
1,Alice,alice@example.com
2,Bob,bob@example.com

Create transform.setl.ttl:

@prefix setl: <http://purl.org/twc/vocab/setl/> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix csvw: <http://www.w3.org/ns/csvw#> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix : <http://example.com/> .

:table a csvw:Table, setl:Table ;
    prov:wasGeneratedBy [ a setl:Extract ; prov:used <data.csv> ] .

:output a void:Dataset ;
    prov:wasGeneratedBy [
        a setl:Transform, setl:JSLDT ;
        prov:used :table ;
        prov:value '''[{
            "@id": "http://example.com/person/{{row.ID}}",
            "@type": "http://xmlns.com/foaf/0.1/Person",
            "http://xmlns.com/foaf/0.1/name": "{{row.Name}}",
            "http://xmlns.com/foaf/0.1/mbox": "mailto:{{row.Email}}"
        }]'''
    ] .

Run SETLr:

setlr transform.setl.ttl

Using from Python

from rdflib import Graph, URIRef
import setlr

# Load SETL script
setl_graph = Graph()
setl_graph.parse("transform.setl.ttl", format="turtle")

# Execute ETL pipeline
resources = setlr.run_setl(setl_graph)

# Access generated RDF
output = resources[URIRef('http://example.com/output')]
print(f"Generated {len(output)} RDF triples")

Documentation

📚 Complete Documentation - Full guides and references

Quick Links:

Advanced Topics:

Key Concepts

SETLr uses RDF (with PROV-O vocabulary) to describe ETL workflows:

  1. Extract: Load data from sources (CSV, Excel, JSON, XML, RDF, SAS)
  2. Transform: Apply templates or Python scripts to generate RDF
  3. Load: Save to files or SPARQL endpoints

Supported Formats

Input:

  • Tabular: CSV, TSV, Excel (XLS/XLSX), SAS (XPORT/SAS7BDAT)
  • Structured: JSON (with ijson selectors), XML (with XPath streaming)
  • Semantic: RDF (Turtle, JSON-LD, RDF/XML, etc.), OWL Ontologies

Output:

  • RDF: Turtle, TriG, N-Triples, N3, RDF/XML, JSON-LD
  • Destinations: Files, SPARQL Update endpoints

Examples

See the examples/ directory for complete working examples:

  • social.setl.ttl - Basic CSV to RDF with conditionals and loops
  • ontology.setl.ttl - OWL ontology transformation with SHACL shapes

Development

# Clone repository
git clone https://github.com/tetherless-world/setlr.git
cd setlr

# Bootstrap (creates venv and installs dependencies)
./script/bootstrap

# Activate virtual environment  
source venv/bin/activate

# Run tests
./script/build

# Run linter
flake8 setlr/

Contributing

Contributions are welcome! Please see our Contributing Guide for details on:

  • Development setup and workflow
  • Code standards and style guidelines
  • Testing requirements
  • Pull request process

Please note that this project follows a Code of Conduct. By participating, you are expected to uphold this code.

License

Apache License 2.0 - see LICENSE file for details.

Citation

If you use SETLr in your research, please cite:

@software{setlr,
  title = {SETLr: Semantic Extract, Transform and Load},
  author = {McCusker, Jamie},
  year = {2024},
  url = {https://github.com/tetherless-world/setlr}
}

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

setlr-1.0.3.tar.gz (39.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

setlr-1.0.3-py3-none-any.whl (26.7 kB view details)

Uploaded Python 3

File details

Details for the file setlr-1.0.3.tar.gz.

File metadata

  • Download URL: setlr-1.0.3.tar.gz
  • Upload date:
  • Size: 39.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for setlr-1.0.3.tar.gz
Algorithm Hash digest
SHA256 fbfe1dfa995309bd819fc930e3e232c83079fc37e99de29939e0c58276c25a6a
MD5 76a01cdb2505a5f9cb4718cc98fcd357
BLAKE2b-256 2c96f56d641666fc3e4c487624ae5c1d6acefae70afb400be1a5d117f7d74404

See more details on using hashes here.

File details

Details for the file setlr-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: setlr-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 26.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for setlr-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 441de8b445819c2a2f96b043b861216756ea3e4f768f05a154b1e0b0714984dd
MD5 1a54e6d2f644c99f283c854fd7769b2b
BLAKE2b-256 6295c407ac38abccbdcd389493985bc1156651113759468d10348102a07f1ccd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page