Skip to main content

Machine-file extractors and transformers for semantic schema pipelines

Project description

semantic-transformers

A library and a curated collection of parsers that bridge raw instrument output files and the semantic-schemas knowledge graph pipeline.

What this repository contains

semantic-transformers/
  src/semantic_transformers/   Python library (Transformer, QuickMapper, …)
  parsers/                     Machine-specific file parsers
    <domain>/                  Mirrors the semantic-schemas folder structure
      <specialisation>/
        <machine>/             One folder per instrument model
          <machine>_parser.py  Reads the instrument file
          column_mapping.json  Maps column names to ontology class IRIs and units
          README.md            Quick-start, schema compatibility, and known limitations
  docs/                        Guides for users and contributors

The two parts

1. The library (src/semantic_transformers/)

Class Role
Parser Protocol to implement when adding support for a new instrument
ParseResult What every parser returns: simplified JSON + DataFrame
Transformer Runs parsing → JSONata transform → RDF graph
TransformResult What Transformer.run() returns: RDF graph + DataFrame
QuickMapper Turns any tabular file into RDF using a simple YAML mapping (no parser needed)

2. The parsers (parsers/)

Each parser targets a specific instrument model. The folder path mirrors the schemas/ tree in semantic-schemas:

Schema Instrument Parser path
characterization/tensile-test/TTO Zwick/Roell (testXpert III) parsers/characterization/tensile-test/zwick/

Installation

Using pip (recommended)

# Install the transformers library
pip install semantic-transformers

# Optional: install optional dependencies
pip install semantic-transformers[excel]  # for Excel file support
pip install semantic-transformers[dev]    # for development and testing

Development installation

Both repositories are designed to be cloned as siblings under a shared folder:

mkdir semantic-dataspace && cd semantic-dataspace

git clone https://github.com/Semantic-Dataspace/semantic-schemas
git clone https://github.com/Semantic-Dataspace/semantic-transformers

python3 -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate

pip install -e semantic-transformers/
pip install jupyterlab            # only needed for the interactive notebooks

Two ways to use this library

Option A: you have a supported instrument

Use a ready-made parser and the matching schema notebook. For a Zwick/Roell tensile test:

jupyter lab semantic-schemas/schemas/characterization/tensile-test/TTO/docs/2_tensile_test_csv_workflow.ipynb

Edit Step 0 (one line, point to your file) and run all cells. Done.

Option B: you have a tabular file with no existing parser

Use QuickMapper. Provide a short YAML that names the columns and points each one at an ontology class IRI:

from semantic_transformers import QuickMapper

mapping = {
    "label": "my experiment",
    "columns": {
        "Force": {
            "iri":  "https://w3id.org/pmd/tto/StandardForce",
            "unit": "http://qudt.org/vocab/unit/N",
        },
        "Extension": {
            "iri": "https://w3id.org/pmd/tto/Extension",
        },
    },
}

result = QuickMapper(mapping).run("my_data.csv")
print(result.graph.serialize(format="turtle"))
print(result.dataframe.head())

Supported file formats: CSV, TSV, Excel (.xlsx), Parquet, JSON. See the QuickMapper notebook for a guided walkthrough.

Development

Running the tests

python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
pytest -v

Refreshing notebook outputs (for documentation)

Notebooks are committed with their output cells so that GitHub renders them as readable documentation. After changing a parser or the library, re-execute all notebooks in-place to update the stored outputs before committing:

find docs -name "*.ipynb" ! -path "*/.ipynb_checkpoints/*" \
  | xargs jupyter nbconvert \
      --to notebook \
      --execute \
      --inplace \
      --ExecutePreprocessor.timeout=300

Run this from the repository root. Commit the resulting *.ipynb changes together with any code changes so that the rendered output on GitHub stays in sync.

Tip. To refresh a single notebook only, pass its path directly:

jupyter nbconvert --to notebook --execute --inplace \
    --ExecutePreprocessor.timeout=300 \
    docs/3_quickstart-mapping.ipynb

Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semantic_transformers-0.1.0.tar.gz (20.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semantic_transformers-0.1.0-py3-none-any.whl (17.1 kB view details)

Uploaded Python 3

File details

Details for the file semantic_transformers-0.1.0.tar.gz.

File metadata

  • Download URL: semantic_transformers-0.1.0.tar.gz
  • Upload date:
  • Size: 20.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for semantic_transformers-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a19b832b60e7eae406c2d1520cb4f5b50603811ff5479a5119266d32ddb96273
MD5 f1c37db21d8699879862af99d3984878
BLAKE2b-256 dda91bdf7faf84bef53fdbda38ddde7ccc7d6fa180282ca2100b6c460faaf291

See more details on using hashes here.

Provenance

The following attestation bundles were made for semantic_transformers-0.1.0.tar.gz:

Publisher: publish.yml on semantic-dataspace/semantic-transformers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file semantic_transformers-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for semantic_transformers-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d8d07a80e2442d08cfcc16acff7ae0c299261a638c6af9b474b31c62982e0815
MD5 abca802d6249c8efbb2448a06d3d2bf9
BLAKE2b-256 cddb27c4ff81fa3d2ee58ab2906a2643821d2284605027723e2b49c64926c410

See more details on using hashes here.

Provenance

The following attestation bundles were made for semantic_transformers-0.1.0-py3-none-any.whl:

Publisher: publish.yml on semantic-dataspace/semantic-transformers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page