Skip to main content

Tool to generate shape expressions from CSV files

Project description

ShExStatements

Python package CI

ShExStatements allows users to generate Shape Expressions (ShEx) from simple CSV statements, CSV files, and spreadsheets. It can be used from the command line, via REST API, or through a modern web interface.

Python compatibility

  • Core CSV/Spreadsheet to ShEx conversion supports modern Python versions including Python 3.13.
  • CI runs on Python 3.12, 3.13, plus 3.14-dev (allowed to fail) to detect future breakages early.

Ways to use ShExStatements

ShExStatements currently supports three primary usage modes:

  1. WASM runtime in the browser (static frontend, no backend required)
  2. Docker runtime (React frontend + FastAPI backend)
  3. Python runtime (CLI and legacy Flask interface)

Quick start

1) Using Python (CLI)

Set up a virtual environment and install shexstatements:

$ python3 -m venv .venv
$ source ./.venv/bin/activate
$ pip3 install shexstatements

Run the following command with an example CSV file. The file contains an example description of a language on Wikidata. This file uses comma as a delimiter to separate the values.

$ shexstatements.sh examples/language.csv

2) Using Docker (Frontend + Backend)

Run the containerized stack:

cd docker
docker compose up

This starts:

For development mode with hot reloading:

cd docker
docker compose -f docker-compose.yml -f docker-compose.dev.yml up

Build from source

Terminal

Clone the ShExStatements repository.

$ git clone https://github.com/johnsamuelwrites/ShExStatements.git

Go to ShExStatements directory.

$ cd ShExStatements

Install modules required by ShExStatements (here: installing into a virtual environment).

$ python3 -m venv .venv
$ source ./.venv/bin/activate
$ pip3 install .

Run the following command with an example CSV file. The file contains an example description of a language on Wikidata. This file uses comma as a delimiter to separate the values.

$ ./shexstatements.sh examples/language.csv

CSV file can use delimiters like ;. Take for example, the following command works with a file using semi-colon as a delimiter.

$ ./shexstatements.sh examples/languagedelimsemicolon.csv --delim ";"

But sometimes, users may like to specify the header. In that case, they can make use of -s or --skipheader to tell the generator to skip the header (first line of CSV).

$ ./shexstatements.sh --skipheader examples/header/languageheader.csv

It is also possible to work with Spreadsheet files like .ods, .xls or .xlsx.

$ shexstatements.sh examples/language.ods
$ shexstatements.sh examples/language.xls
$ shexstatements.sh examples/language.xlsx

In all the above cases, the shape expression generated by ShExStatements will look like

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
start = @<language>
<language> {
  wdt:P31 [ wd:Q34770  ] ;# instance of a language
  wdt:P1705 LITERAL ;# native name
  wdt:P17 .+ ;# spoken in country
  wdt:P2989 .+ ;# grammatical cases
  wdt:P282 .+ ;# writing system
  wdt:P1098 .+ ;# speakers
  wdt:P1999 .* ;# UNESCO language status
  wdt:P2341 .+ ;# indigenous to
}

It's also possible to use application profiles of the following form

Entity_name,Property,Property_label,Mand,Repeat,Value,Value_type,Annotation

and Shape expressions can be generated using the following form

$ ./shexstatements.sh -ap --skipheader examples/languageap.csv

Objectives

  • Easily generate shape expressions (ShEx) from CSV files and Spreadsheets
  • Simple syntax

Documentation and examples

A detailed documentation is available here, with example CSV files in the examples folder.

Test cases and coverage

All the test cases can be run in the following manner

$ python3 -m tests.tests

Code coverage report can also be generated by running the unit tests using the coverage tool.

$ coverage run --source=shexstatements -m unittest tests.tests
$ coverage report -m

Web Interface

Modern Web Interface (v1.0+)

ShExStatements now includes a modern, feature-rich web interface built with React and TypeScript.

Using Docker (recommended):

cd docker
docker compose up

Access the interface at http://localhost:3000

Features:

  • Split-pane editor with Monaco Editor (VS Code-like experience)
  • Syntax highlighting for ShExStatements and ShEx output
  • Dark mode support
  • File upload support (CSV, ODS, XLS, XLSX)
  • Multiple delimiter options (comma, pipe, semicolon)
  • Real-time error display
  • Copy output to clipboard
  • Runtime selector (Auto, API, WASM)

Static GitHub Pages (WASM)

The frontend can run conversion directly in the browser using Python-on-WASM (Pyodide), so it can be deployed as a static site on GitHub Pages.

  1. Enable GitHub Pages in repository settings (source: GitHub Actions).
  2. Push to main or master.
  3. The workflow .github/workflows/pages.yml builds and deploys the frontend with VITE_RUNTIME_MODE=wasm.

In WASM runtime:

  • CSV conversion to ShEx is supported in-browser.
  • Spreadsheet uploads (.xlsx, .xls, .ods) are also supported in-browser.
  • Pyodide dynamically installs Python dependencies (shexstatements, ply, and spreadsheet libraries) in the browser runtime.

Legacy Web Interface

The original Flask-based interface is still available:

$ python3 -m venv .venv
$ source ./.venv/bin/activate
$ pip3 install .
$ ./shexstatements.sh -r

Check the URL http://127.0.0.1:5000/

API

ShExStatements provides a REST API for programmatic access.

Modern API (v1.0+)

The new FastAPI-based API provides:

Convert endpoint:

curl -X POST http://localhost:8000/api/v1/convert \
  -H "Content-Type: application/json" \
  -d '{"content": "@shape|prop|value", "delimiter": "|", "output_format": "shex"}'

API documentation

Detailed API documentation (modern v1 API and legacy compatibility notes) is available here.

Deployment Modes

  • Standalone Python application: CLI + legacy Flask UI (./shexstatements.sh).
  • Docker application: React frontend + FastAPI backend (docker compose up).
  • Static GitHub Pages frontend: WASM runtime (no backend required for CSV-to-ShEx).

Demonstration

Online demonstrations are also available:

Author

Conference Proceedings

  • ShExStatements: Simplifying Shape Expressions for Wikidata , John Samuel, Wiki Workshop 2021 (held at The Web Conference 2021), 14 April 2021 (PDF, Slides)

Acknowledgements

  • Wikidata Community

Archives and Releases

Licence

All code are released under GPLv3+ licence. The associated documentation and other content are released under CC-BY-SA.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shexstatements-1.0.1.tar.gz (64.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shexstatements-1.0.1-py3-none-any.whl (47.4 kB view details)

Uploaded Python 3

File details

Details for the file shexstatements-1.0.1.tar.gz.

File metadata

  • Download URL: shexstatements-1.0.1.tar.gz
  • Upload date:
  • Size: 64.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for shexstatements-1.0.1.tar.gz
Algorithm Hash digest
SHA256 828a0b4cabd8f42523c547875ea8d1ffc00f9adcf211f90b3de2cb6a772930b4
MD5 3d19e15f724c9031690dfc5a79ef8f20
BLAKE2b-256 7507a6d61a5de0ae691634c19953cc9b185c4540c4e1f83d6d502bd34ff4bae8

See more details on using hashes here.

Provenance

The following attestation bundles were made for shexstatements-1.0.1.tar.gz:

Publisher: release-pypi.yml on johnsamuelwrites/ShExStatements

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file shexstatements-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: shexstatements-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 47.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for shexstatements-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 54f483908f5bb141ed3ac7dd97cc1451089a1669465070033b3fa99d6c64184c
MD5 1181451d315b823d1881d82f2e50f635
BLAKE2b-256 26757ffd60e21f9747b8e1d800e35b8ddb765657a56cde372ef59d114226cff9

See more details on using hashes here.

Provenance

The following attestation bundles were made for shexstatements-1.0.1-py3-none-any.whl:

Publisher: release-pypi.yml on johnsamuelwrites/ShExStatements

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page