Transformation and conversion framework (ETL) mainly for geospatial data
Project description
Stetl - Streaming ETL
Stetl, streaming ETL, pronounced "staedl", is a lightweight ETL-framework for geospatial data conversion.
Notice: the Stetl GH repo is now at the GeoPython GH organization.
License
Stetl is released under a GNU GPL v3 license (see LICENSE.txt).
Documentation
The Stetl website and documentation can be found via http://stetl.org. For a quick overview read the 5-minute Stetl-introduction, or a more detailed presentation. Stetl was presented at several events like the FOSS4G 2013 in Nottingham and GeoPython 2016.
Concepts
Stetl basically glues together existing parsing and transformation tools like GDAL/OGR, Jinja2 and
XSLT with custom Python code. By using native libraries like libxml2
and libxslt
(via Python lxml
) Stetl is speed-optimized.
A configuration file, in Python config .ini
format, specifies a chained sequence of transformation
steps: typically an Input
connected to one or more Filters
, and finally to an Output
.
At runtime, this sequence is instantiated and run as a linked series of Python objects. These objects are
symbolically specified (by their module/class name) and parameterized in the config file.
Via the stetl -c <config file>
command, the transformation is executed.
Stetl has been proven to handle 10's of millions of GML objects without any memory issues.
This is achieved through a technique called "streaming and splitting".
For example: using the OgrPostgisInput
module an GML stream can be generated from the database.
A component called the GmlSplitter
can split this stream into manageable chunks (like 20000 features)
and feed this upstream into the ETL chain.
Use Cases
Stetl has been found particularly useful for complex GML-related ETL-cases, like those found within EU INSPIRE Data Harmonization and the transformation of GML/XML-based National geo-datasets to for example PostGIS.
Most of the data conversions within the Dutch NLExtract Project apply Stetl.
Stetl also proved to be very effective in IoT-related transformations involving the SensorWeb/SOS.
Examples
Browse all examples under the examples dir. Best is to start with the basic examples
Installation
Stetl can be installed via PyPi pip install stetl
and recently as a Stetl Docker image.
More on installation in the documentation.
Contributing
Anyone and everyone is welcome to contribute. Please take a moment to review the guidelines for contributing.
Origins
Stetl originated in the INSPIRE-FOSS project: 2009-2013 now archived. Since then Stetl evolved into a wider use like transforming Dutch GML-based Open Datasets such as IMGEO/BGT (Large Scale Topography) and IMKAD/BRK (Cadastral Data) and Sensor Data Transformation and Calibration.
Finally
The word "stetl" is also an alternative writing for "shtetl": http://en.wikipedia.org/wiki/Stetl : "...Material things were neither disdained nor extremely praised in the shtetl. Learning and education were the ultimate measures of worth in the eyes of the community, while money was secondary to status..."
Changes
v2.2 - PLANNED
Leftovers from 2.2, also new GH Workflows for CI/CD and new Dockerfile.
v2.1 - january 9, 2023
See closed issues in Milestone 2.1: https://github.com/geopython/stetl/milestone/11?closed=1 Mainly cleanup of Py3 migration issues, version upgrades for supporting libs (GDAL etc) and Dockerfile. Also many issues related to BAG v2 ETL, which uses the GDAL OGR BAGLV Driver, within NLExtract. See https://github.com/nlextract/NLExtract.
v2.0 - april 11, 2019
FIRST VERSION SUPPORTING PYTHON3-ONLY!
See closed issues in Milestone 2.0: https://github.com/geopython/stetl/milestone/10?closed=1 These are all related to the Py2 to Py3 migration. Other issues are moved to later Milestones/releases.
Main is the PR worked on for the Py2 to Py3 migration: https://github.com/geopython/stetl/pull/81
v1.3 - march 20, 2019
LAST VERSION SUPPORTING PYTHON2! See closed issues in Milestone 1.3: https://github.com/geopython/stetl/milestone/9?closed=1
Very few changes, this release is mainly to make a baseline for v2.0 (Python3).
v1.2 - july 7, 2018
See closed issues in Milestone 1.2: https://github.com/geopython/stetl/milestone/8?closed=1
Most important changes are related to deployment in Docker and Kubernetes environments, dealing with (env) variables, Stetl arguments and logging, for example:
- issue #71: Allow Environment vars to substitute/override config template arg-variables
- issue #72: Allow multiple -a args for Stetl main prog. Allowing multiple -a arguments allows for more simpler overriding of for example default options.
- #68 Stetl should not output passwords and other particular data in its log
v1.1.1 - november 7, 2017
Biggest change is that the Stetl repo moved to https://github.com/geopython/stetl/.
See closed issues in Milestone 1.1.1: https://github.com/geopython/stetl/milestone/6?closed=1
Highlights:
- new Component Splitter to split (Filter/Output) data streams within a Chain
- new Component Merger to combine (Input) data streams within a Chain
- Splitter and Merger can be combined in single Chain
- automatic Travis build
- more Unit tests
- flake8 for clean Python code
- move to GDAL v2 (though v1 may still work)
- new compact Docker Image based on debian:stretch-slim
- bugfixes XML stream support
v1.0.9 - 17 june 2016
See https://github.com/geopython/stetl/issues?q=milestone%3A%22Version+1.0.9%22+is%3Aclosed
Highlights:
- Substitutable config options in properties file (-a arg)
- Docker support via Stetl Docker image
- Generic ogr2ogr Input Component enhancement
v1.0.8 - 2 july 2015
See https://github.com/geopython/stetl/issues?q=milestone%3A%22Version+1.0.8%22
- generic OgrOutput component
- Apache Log File Input
v1.0.7 - 24 nov 2014
- start of Stetl Format conversion Filter
- generic OgrInput component
- SQLite Input component
- BAG to INSPIRE Addresses example with Jinja2 Templating Filter
- Jinja2 Filter refinements
v1.0.6 - 5 sept 2014
- allow spatial_extent in OGROutput Top10NL example
- httpinput more robust
- Packet: new type 'record', as Python dict structure
- httpinput: ApacheDirInput, input data from Apache index listing
- dboutput: PostgresInsertOutput, insert single record into Postgres
- component: add before_/after_invoke and after_chain_invoke() for intercepting
- filters: start of Python Templating filters: simple string and Jinja2 templating
- Packet: new type 'struct' basically a free form dict, the result of reading CSV
- input: CSV file input
- new examples: 9_string_templating and 10_jinja2_templating
- start stetl --doc option to print class configuration info
- config: start of adding meta attribute config info via class vars of type Attr
v1.0.5 - 19 feb 2014
- cater for strange lxml parse error: https://bugs.launchpad.net/lxml/+bug/1185701
- more Dutch BGT (large scale topo) examples by thijsbrentjens
v1.0.4 - 23 sept 2013
- more documention
- Dutch BGT (Basis Registratie Grootschalige Topografie) example
- Ordnance Survey Mastermap example
- strip XML namespaces option to XmlElementStreamerFileInput
v1.0.1 v1.0.3 - aug/sept 2013
Minor changes to enable distribution.
v1.0.0 - june 2013
- First version
- Add to Python Package Index (#3).
Credits
Stetl is developed by:
- Just van den Broecke (initiator, http://www.justobjects.nl)
- Frank Steggink
- Thijs Brentjens
- and more, see contributors: https://github.com/geopython/stetl/graphs/contributors
Bas Couwenberg is providing Debian/Ubuntu packaging.
Rob van Loon preparing Python3 migration and other.
This project would not be possible without the great work of Frank Warmerdam and other GDAL/OGR developers (http://gdal.org).
Plus the people that brought Python, PostGIS (Paul Ramsey et al.), Jinja2, lxml and the libs like GEOS, Proj, libxml2 and libxslt.
We are mainly standing on the shoulders of these giants.
Thanks to Tom Kralidis for helping out to move from personal repo to https://github.com/geopython organization.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file Stetl-2.1.tar.gz
.
File metadata
- Download URL: Stetl-2.1.tar.gz
- Upload date:
- Size: 6.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b4052cad5eaf585b96d4a8ac9e8b706a2cd445b8e223b11b775873af2bcf7891 |
|
MD5 | cdc4c26acd267f1f74be99a1d828f83e |
|
BLAKE2b-256 | 1c7f0c3cecb973338f67016c3cf8ce825ffcd468b632b5913496fe5ef4f519ec |
File details
Details for the file Stetl-2.1-py3.7.egg
.
File metadata
- Download URL: Stetl-2.1-py3.7.egg
- Upload date:
- Size: 310.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d68fa7c17b81e54edbc82878bbf7b6fc50a1d49cb5203bcbd687ae4bad77e7c6 |
|
MD5 | d3e5cf78880423037c740da92fe7a344 |
|
BLAKE2b-256 | de43de9eecb964e44490802b871f0b2198bc6816c5f2dc7a6b588dd1b0ec8034 |