Skip to main content

A lightweight orchestrator for spatial databases

Project description

MakeGIS

A lightweight orchestrator for spatial databases.

MakeGIS uses YAML files to describe DAG nodes that let you achieve one of three things:

  • load data to a target database
  • transform data in a target database
  • run custom commands

MakeGIS comes with a command line tool, mkgs, that operates on the resulting DAG.

Key features/choices:

  • Local and standalone: mkgs runs locally, no other service involved
  • Supports many data sources: describe where the data is, MakeGIS handles the rest
  • Works for both ETL and ELT workflows
  • Automatic dependency discovery for SQL transforms
  • Flexible: run anything you want
  • Reproducible pipelines
  • Data lineage

[!Note] MakeGIS is under active development, expect breaking changes.

Installation

pip install makegis

Usage

Makegis provides the mkgs CLI utility to operate on the DAG.

usage: mkgs [-h] [-v] [--debug] {init,ls,outdated,run} ...

positional arguments:
  {init,ls,outdated,run}
                        commands
    init                initialize journal on target
    ls                  list nodes
    outdated            report outdated nodes
    run                 run nodes

options:
  -h, --help            show this help message and exit
  -v, --verbose         verbose messages
  --debug               debug messages

mkgs init

The init command prepares a target database to work with MakeGIS. It creates a _makegis_log journal table that is used to track which nodes have been run, when and at what version. It will also create any missing schemas expeced by the DAG.

usage: mkgs init [-h] [-t TARGET]

options:
  -h, --help           show this help message and exit
  -t, --target TARGET  db instance to target

mkgs ls

The ls command shows DAG nodes matching a selection pattern. At this stage only * wildcards are supported but additional operators are planned (e.g. +<pattern> or <pattern>+ for upstream/downstream propagation).

usage: mkgs ls [-h] pattern

positional arguments:
  pattern     DAG selection pattern

options:
  -h, --help  show this help message and exit

mkgs outdated

The outdated command reports outdated nodes for the given target.

usage: mkgs outdated [-h] [-t TARGET]

options:
  -h, --help           show this help message and exit
  -t, --target TARGET  db instance to target

mkgs run

The run command will run the nodes matching a selection pattern (same as mkgs ls). Nodes that are fresh (i.e. not outdated) will be skipped. This can be overridden by using the --force flag.

usage: mkgs run [-h] [-t TARGET] [-d] [-f] pattern

positional arguments:
  pattern              DAG selection pattern

options:
  -h, --help           show this help message and exit
  -t, --target TARGET  db instance to target
  -d, --dry-run        process nodes without actually running them
  -f, --force          also run fresh nodes

Configuration

Makegis is configured through YAML configuration files and environment variables.

A makegis.root.yml file defines the root of a MakeGIS project, along with project-wide settings. MakeGIS will traverse the directory tree and look for any makegis.yml files.

An example project may look like this:

project/
├─ src/
|  ├─ raw/
|  │  ├─ provider/
|  │  │  └─ makegis.yml
|  |  └─ makegis.yml
|  └─ core/
|     ├─ transform_1.sql
|     ├─ transform_2.sql
|     ├─ transform_3.sql
|     └─ makegis.yml
├─ .env
├─ .gitignore
└─ makegis.root.yml

[!Note]
Environment variables can be used by enclosing them in double curly brackets: {{ EXAMPLE }}. MakeGIS will consider any .env files in the project tree.

makegis.root.yml

A makegis.root.yml file defines the root of a MakeGIS project along with project wide settings. Here's an annotated example:

# The project's root directory.
src_dir: ./src

# Global defaults
defaults:
  # Global defaults for `load` nodes
  load:
    epsg: 4326
    geom_index: false
  # Optional default target (to use we running mkgs without a `--target` option)
  target: pg_dev

# Databases to target
targets:
  pg_prod:
    host: prod.example.com
    port: 5432
    user: mkgs
    db: postgres
  pg_dev:
    host: 127.0.0.1
    port: 5432
    user: mkgs
    db: postgres

makegis.yml

The path of a makegis.yml determines the database relations they manage, whith top-level directories mapping to schemas.

A makegis.yml contains one of the following configuration blocks:

  • load: defines sources to be loaded to a target
  • transform: defines transforms to be applied to a target
  • node: custom node to run bespoke commands

Load block

Maps tables to external data sources. Each table becomes a DAG node and can be invoked individually

load:
  <table-name>:
    <loader>: <loader-arg>
    <loader-option>: <option-value>
    <loader-option>: <option-value>
    ...
load:
  countries:
    wfs: https://wfs.example.com/countries?token={{API_KEY}}
    epsg: 4326
    geom_index: true

TODO: Document loaders and their options.

EPSG option
Single value

epsg: 4326:

Target SRID. If source declares a different EPSG, a tranformation is applied. If source has no SRID, no transformation is applied and srid is set to given value.

Mapping

epsg: 4326:2193

Convert from source to dest. Warn or abort if source exposes a different SRID

Transform block

Declares sql scripts to be enrolled. Each script becomes a DAG node. Dependencies with other DAG nodes are resolved automatically. The order in which sql scripts are listed does not matter. There are no constraints on what is in the sql scripts, as long as MakeGIS is aware of all dependencies.

transform:
  - create_view_of_awesome_table.sql
  - create_awesome_table.sql

Node block

A node block defines a custom DAG node, for when more flexibility is needed than offered by a load or tranform block.

The price to pay for more flexibility is that dependencies need to be documented manually. This goes for upstream dependecies as well as objects created on the target db.

node:
  # List any relations needed by this node.
  deps:
    - schema.upstream_table
  # Commands that do not change the target db but need to be run before we proceed.
  # Commands are run sequentially, in listing order.
  prep:
    - before.py
  # Main section
  do:
    # List of commands along with any objects they will create on the target.
    run:
      - cmd: script1.py
        # Declare objects owned by this command
        creates:
          - table: new_table
          - function: helper
    # Can also use a load block here, but it won't spawn new DAG nodes
    <load-block>
  # Like prep, but runs after `do`, and only if `do` runs fine.
  post:
    - after.py
  # Like post but always runs, even if something failed prior.
  finally:
    - teardown.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

makegis-0.1.0.tar.gz (24.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

makegis-0.1.0-py3-none-any.whl (27.3 kB view details)

Uploaded Python 3

File details

Details for the file makegis-0.1.0.tar.gz.

File metadata

  • Download URL: makegis-0.1.0.tar.gz
  • Upload date:
  • Size: 24.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for makegis-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9808e87fdcc620cb390aa702b0cbda38a918eaccc0fa0ace5ddd761cfe55d17d
MD5 448a44a5191894e7bfd8abca84eb3a4e
BLAKE2b-256 6146270a8266b821c759ede3a33f80c4f995d1372b69c7179921d31291ccb50d

See more details on using hashes here.

File details

Details for the file makegis-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: makegis-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 27.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for makegis-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aba38da65652c6246677c9757a85558ebf73adace0b7878894bc743cd6da0dd4
MD5 03461c492244614cc477672d9b4f1514
BLAKE2b-256 aa3eb452cd490a29cd570c15180710b701f25901c8e268283a72aa838f1f66ed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page