Skip to main content

A minimal modern data stack with working data pipelines in a single Docker container.

Project description

contributors-shield Forks Stargazers Issues MIT License LinkedIn

current release

Logo

mimosa

The ELT part of a modern data stack with practical data pipelines using cloud functionality.
Explore the docs »

Report Bug · Request Feature


Table of Contents
  1. About the Project
  2. Getting Started
  3. Usage
  4. Roadmap
  5. Contributing
  6. Contact

(back to top)

About the Project



The ELT part of a modern data stack with practical data pipelines and reporting using cloud functionality. This is similar in concept to mimodast using alternative software options and cloud functionality.

Mimosa encompasses the ELT (extract load transform) components necessary to generate the webpage found at gas.aspireto.win, providing detailed reports on natural gas storage volumes within the European Union. This process involves retrieving data from a REST API, transforming it, and storing it in a database tailored for reporting purposes.

The source data is published by Gas Infrastructure Europe and exposed in a REST API.

Beyond gas storage data, Mimosa offers a hands-on experience with essential tools:

  • 🚀 dlt for smooth data loading.
  • 🔍 dbt for powerful data transformation.
  • ☁️ MotherDuck for storing the data in a cloud based DuckDB database.

Further the full tech stack used to create the gas.aspireto.win pages is detailed below.


(back to top)

Getting Started

Prerequisites

Setup a Python development environment.

API Keys

Ensure the following sensitive information is securely stored in environment variables or within a .env file:

  • To access the GIE Gas Inventory REST API, an API key is necessary. Quickly obtain your API key by signing up for a free GIE account. Once acquired, expose it using the following environment variable:

    • ENV_GIE_XKEY = "YOUR-API-KEY"
  • For MotherDuck, you'll need the service token and the database name. Set up the following environment variables to establish the connection:

    • DESTINATION__MOTHERDUCK__CREDENTIALS = "md:///YOUR-DATABASE-NAME?token=YOUR-SERVICE-TOKEN"
    • Please note that the MotherDuck page utilizes a different format, whereas the above format is specifically required for dlt.

(back to top)

Installation

Execute the following command. Consider using a venv.

pip install ternyxmimosa

Alternatively clone this repository and use poetry install. Or pip install from GitHub.

(back to top)

Usage

Command Line

Not currently supported.

As a Python Package

The following sample obtains the storage data for the last available date and stores it in MotherDuck.

import mimosa.cli as GEI

GEI.main()

Tech Stack

These are the technologies driving the content on gas.aspireto.win:

  • Google cloud function for the ELT component:
    • The function is a bare bones wrapper around the mimosa Python package (the current repository). The function is in this repository.
    • It is scheduled to run the ELT twice daily (using Google Scheduler and Pub/Sub message).
    • The result is updated data in MotherDuck.
  • Reporting notebook
    • Built using the evidence reporting tool, defined in this GitHub repository.
    • Rebuild and published to a web host using a GitHub workflow.
      • Run on a twice daily schedule. The workflow is defined in the notebook repository.

NOTE: As of November 2023 it is possible to fully deploy this stack without breaking the bank (using free tiers of the cloud services used). Dive into our GitHub repository and the linked ones for the Google Function and Evidence notebook, where all the code awaits. 🚀

Sentry

To enable logging iusing sentry.io specify the environment variable RUNTIME__SENTRY_DSN

(back to top)

NOTE: For some reason the environment variable DESTINATION__MOTHERDUCK__CREDENTIALS is oftentimes incorrectly set between runs when using the dev container. Use unset DESTINATION__MOTHERDUCK__CREDENTIALS to clear the environment variable.

Roadmap

Consider:
  • Get source data (Using REST API)
  • Transform data, possibly SQL Mesh or dbt.
  • dlt update/error messages using Slack
  • Storage (currently local DuckDB, maybe consider some cloud alternative. Though that would stray from the data stack in a Docker concept.) (MotherDuck)
  • Scheduling Tool (Google Cloud Scheduler)
  • Reporting tool (Metabase?) (Evidence.dev in separate repository)
  • Bare bones CLI

(back to top)

Contributing

Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also open a feature request or bug report. Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

Contact

Project Link: mimosa

(back to top)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ternyxmimosa-0.7.0.tar.gz (42.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ternyxmimosa-0.7.0-py3-none-any.whl (42.7 kB view details)

Uploaded Python 3

File details

Details for the file ternyxmimosa-0.7.0.tar.gz.

File metadata

  • Download URL: ternyxmimosa-0.7.0.tar.gz
  • Upload date:
  • Size: 42.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.4 Linux/5.15.167.4-microsoft-standard-WSL2

File hashes

Hashes for ternyxmimosa-0.7.0.tar.gz
Algorithm Hash digest
SHA256 31d0b492167804c608d69475f1b3033cd8592ea1edeb885d0b35b6827e1d2bac
MD5 d2b49f547d38d7c2299f396741ec63ec
BLAKE2b-256 2147c52e17908747712841cb73b727af0d9feb4d15a795314525d31629f08c08

See more details on using hashes here.

File details

Details for the file ternyxmimosa-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: ternyxmimosa-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 42.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.4 Linux/5.15.167.4-microsoft-standard-WSL2

File hashes

Hashes for ternyxmimosa-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a52ef543795e62bf4ab3b00a96f2ef3f9e0d0e60cf4b8e935233608c1f84dae0
MD5 36866295a8398e3b1969e5171a8d8019
BLAKE2b-256 f0f98a206a867eed99ed3c5fe4afb22b49541eed845b24544ef3b870c418bad7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page