Tool to generate shape expressions from CSV files
Project description
ShExStatements
ShExStatements allows users to generate Shape Expressions (ShEx) from simple CSV statements, CSV files, and spreadsheets. It can be used from the command line, via REST API, or through a modern web interface.
Python compatibility
- Core CSV/Spreadsheet to ShEx conversion supports modern Python versions including Python 3.13.
- CI runs on Python
3.12,3.13, plus3.14-dev(allowed to fail) to detect future breakages early.
Ways to use ShExStatements
ShExStatements currently supports three primary usage modes:
WASMruntime in the browser (static frontend, no backend required)Dockerruntime (React frontend + FastAPI backend)Pythonruntime (CLI and legacy Flask interface)
Quick start
1) Using Python (CLI)
Set up a virtual environment and install shexstatements:
$ python3 -m venv .venv
$ source ./.venv/bin/activate
$ pip3 install shexstatements
Run the following command with an example CSV file. The file contains an example description of a language on Wikidata. This file uses comma as a delimiter to separate the values.
$ shexstatements.sh examples/language.csv
2) Using Docker (Frontend + Backend)
Run the containerized stack:
cd docker
docker compose up
This starts:
Frontend: http://localhost:3000Backend API: http://localhost:8000Swagger/OpenAPI docs: http://localhost:8000/docs
For development mode with hot reloading:
cd docker
docker compose -f docker-compose.yml -f docker-compose.dev.yml up
Build from source
Terminal
Clone the ShExStatements repository.
$ git clone https://github.com/johnsamuelwrites/ShExStatements.git
Go to ShExStatements directory.
$ cd ShExStatements
Install modules required by ShExStatements (here: installing into a virtual environment).
$ python3 -m venv .venv
$ source ./.venv/bin/activate
$ pip3 install .
Run the following command with an example CSV file. The file contains an example description of a language on Wikidata. This file uses comma as a delimiter to separate the values.
$ ./shexstatements.sh examples/language.csv
CSV file can use delimiters like ;. Take for example, the following command works with a file using semi-colon as a delimiter.
$ ./shexstatements.sh examples/languagedelimsemicolon.csv --delim ";"
But sometimes, users may like to specify the header. In that case, they can make use of -s or --skipheader to tell the generator to skip the header (first line of CSV).
$ ./shexstatements.sh --skipheader examples/header/languageheader.csv
It is also possible to work with Spreadsheet files like .ods, .xls or .xlsx.
$ shexstatements.sh examples/language.ods
$ shexstatements.sh examples/language.xls
$ shexstatements.sh examples/language.xlsx
In all the above cases, the shape expression generated by ShExStatements will look like
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
start = @<language>
<language> {
wdt:P31 [ wd:Q34770 ] ;# instance of a language
wdt:P1705 LITERAL ;# native name
wdt:P17 .+ ;# spoken in country
wdt:P2989 .+ ;# grammatical cases
wdt:P282 .+ ;# writing system
wdt:P1098 .+ ;# speakers
wdt:P1999 .* ;# UNESCO language status
wdt:P2341 .+ ;# indigenous to
}
It's also possible to use application profiles of the following form
Entity_name,Property,Property_label,Mand,Repeat,Value,Value_type,Annotation
and Shape expressions can be generated using the following form
$ ./shexstatements.sh -ap --skipheader examples/languageap.csv
Objectives
- Easily generate shape expressions (ShEx) from CSV files and Spreadsheets
- Simple syntax
Documentation and examples
A detailed documentation is available here, with example CSV files in the examples folder.
Test cases and coverage
All the test cases can be run in the following manner
$ python3 -m tests.tests
Code coverage report can also be generated by running the unit tests using the coverage tool.
$ coverage run --source=shexstatements -m unittest tests.tests
$ coverage report -m
Web Interface
Modern Web Interface (v1.0+)
ShExStatements now includes a modern, feature-rich web interface built with React and TypeScript.
Using Docker (recommended):
cd docker
docker compose up
Access the interface at http://localhost:3000
Features:
- Split-pane editor with Monaco Editor (VS Code-like experience)
- Syntax highlighting for ShExStatements and ShEx output
- Dark mode support
- File upload support (CSV, ODS, XLS, XLSX)
- Multiple delimiter options (comma, pipe, semicolon)
- Real-time error display
- Copy output to clipboard
- Runtime selector (
Auto,API,WASM)
Static GitHub Pages (WASM)
The frontend can run conversion directly in the browser using Python-on-WASM (Pyodide), so it can be deployed as a static site on GitHub Pages.
- Enable GitHub Pages in repository settings (source: GitHub Actions).
- Push to
mainormaster. - The workflow
.github/workflows/pages.ymlbuilds and deploys the frontend withVITE_RUNTIME_MODE=wasm.
In WASM runtime:
- CSV conversion to ShEx is supported in-browser.
- Spreadsheet uploads (
.xlsx,.xls,.ods) are also supported in-browser. - Pyodide dynamically installs Python dependencies (
shexstatements,ply, and spreadsheet libraries) in the browser runtime.
Legacy Web Interface
The original Flask-based interface is still available:
$ python3 -m venv .venv
$ source ./.venv/bin/activate
$ pip3 install .
$ ./shexstatements.sh -r
Check the URL http://127.0.0.1:5000/
API
ShExStatements provides a REST API for programmatic access.
Modern API (v1.0+)
The new FastAPI-based API provides:
- OpenAPI/Swagger documentation at http://localhost:8000/docs
- Async request handling
- Structured JSON responses with error details
Convert endpoint:
curl -X POST http://localhost:8000/api/v1/convert \
-H "Content-Type: application/json" \
-d '{"content": "@shape|prop|value", "delimiter": "|", "output_format": "shex"}'
API documentation
Detailed API documentation (modern v1 API and legacy compatibility notes) is available here.
Deployment Modes
- Standalone Python application: CLI + legacy Flask UI (
./shexstatements.sh). - Docker application: React frontend + FastAPI backend (
docker compose up). - Static GitHub Pages frontend: WASM runtime (no backend required for CSV-to-ShEx).
Demonstration
Online demonstrations are also available:
Author
- John Samuel
- Contributors
Conference Proceedings
- ShExStatements: Simplifying Shape Expressions for Wikidata , John Samuel, Wiki Workshop 2021 (held at The Web Conference 2021), 14 April 2021 (PDF, Slides)
Acknowledgements
- Wikidata Community
Archives and Releases
Licence
All code are released under GPLv3+ licence. The associated documentation and other content are released under CC-BY-SA.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file shexstatements-1.0.1.tar.gz.
File metadata
- Download URL: shexstatements-1.0.1.tar.gz
- Upload date:
- Size: 64.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
828a0b4cabd8f42523c547875ea8d1ffc00f9adcf211f90b3de2cb6a772930b4
|
|
| MD5 |
3d19e15f724c9031690dfc5a79ef8f20
|
|
| BLAKE2b-256 |
7507a6d61a5de0ae691634c19953cc9b185c4540c4e1f83d6d502bd34ff4bae8
|
Provenance
The following attestation bundles were made for shexstatements-1.0.1.tar.gz:
Publisher:
release-pypi.yml on johnsamuelwrites/ShExStatements
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
shexstatements-1.0.1.tar.gz -
Subject digest:
828a0b4cabd8f42523c547875ea8d1ffc00f9adcf211f90b3de2cb6a772930b4 - Sigstore transparency entry: 1011759634
- Sigstore integration time:
-
Permalink:
johnsamuelwrites/ShExStatements@e224149f8b6fda206c96bc2d50abf4fb66be4b8d -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/johnsamuelwrites
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-pypi.yml@e224149f8b6fda206c96bc2d50abf4fb66be4b8d -
Trigger Event:
release
-
Statement type:
File details
Details for the file shexstatements-1.0.1-py3-none-any.whl.
File metadata
- Download URL: shexstatements-1.0.1-py3-none-any.whl
- Upload date:
- Size: 47.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
54f483908f5bb141ed3ac7dd97cc1451089a1669465070033b3fa99d6c64184c
|
|
| MD5 |
1181451d315b823d1881d82f2e50f635
|
|
| BLAKE2b-256 |
26757ffd60e21f9747b8e1d800e35b8ddb765657a56cde372ef59d114226cff9
|
Provenance
The following attestation bundles were made for shexstatements-1.0.1-py3-none-any.whl:
Publisher:
release-pypi.yml on johnsamuelwrites/ShExStatements
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
shexstatements-1.0.1-py3-none-any.whl -
Subject digest:
54f483908f5bb141ed3ac7dd97cc1451089a1669465070033b3fa99d6c64184c - Sigstore transparency entry: 1011759671
- Sigstore integration time:
-
Permalink:
johnsamuelwrites/ShExStatements@e224149f8b6fda206c96bc2d50abf4fb66be4b8d -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/johnsamuelwrites
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-pypi.yml@e224149f8b6fda206c96bc2d50abf4fb66be4b8d -
Trigger Event:
release
-
Statement type: