Skip to main content

Create asynchronous data pipelines and deploy to cloud or airflow

Project description


Discord :sunglasses: | Forum :wave: | Installation :floppy_disk: | Documentation :notebook:



Create tasks in modern Python
Elegant YAML DAGS for Data Pipelines
Deploy to AWS Lambda or to your existing Airflow.


Linux


Why Typhoon? | Key Features | Example YAML | Installation


Why Typhoon?

Our vision is a new generation of cloud native, asynchronous orchestrators that can handle highly dynamic workflows with ease. We crafted Typhoon from the ground up to work towards this vision. It's designed to feel familiar while still making very different design decisions where it matters.

Why Typhoon + AWS Lambda?

A Serverless orchestrator has the potential to be infinitely scalable and extremely cost efficient at the same time. We think AWS Lambda is ideal for this:

  • CloudWatch Events can trigger a Lambda on a schedule, so we get scheduling for free! A scheduler is the most complex piece of an orchestrator. We can do away with it completely and still be sure that our DAGs will always run on time.
  • Lambda is cheap. You get 1 million invocations for free every month.
  • Workflows can be paralellized by running tasks in parallel on different instances of the Lambda. Typhoon DAGs use batching to take full advantage of this.

Why Typhoon + Airflow?

Airflow is great!

Typhoon lets you write Airflow DAGS faster :rocket::

**Workflow**: Typhoon YAML DAG --> Typhoon build --> Airflow DAG 

Simplicity and re-usability; a toolkit designed to be loved by Data Engineers :heart:

Key features

Elegant - YAML; low-code and easy to learn.

Code-completion - Fast to compose. (VS Code recommended).

Data sharing - data flows between tasks making it super intuitive.

Composability - Functions and connections combine like Lego.

UI Component

Components - reduce complex tasks to 1 re-usable tasks

Packaged examples:

  • Glob & Compress
  • FileSystem to DB
  • DB to FileSystem
  • DB to Snowlfake

UI: Share pre-built components (data pipelines) with your team :raised_hands:

UI Component

Rich CLI & Shell: Inspired by others; instantly familiar.

Testable Tasks - automate DAG task tests.

Testable Python - test functions or full DAGs with PyTest.

UI Component

Example YAML DAG

name: favorite_authors
schedule_interval: rate(1 day)

tasks:
  choose_favorites:
    function: typhoon.flow_control.branch
    args:
      branches:
        - J. K. Rowling
        - George R. R. Martin
        - James Clavell

  get_author:
    input: choose_favorites
    function: functions.open_library_api.get_author
    args:
      author: !Py $BATCH

  write_author_json:
    input: get_author
    function: typhoon.filesystem.write_data    
    args:
      hook: !Hook data_lake
      data:  !MultiStep
        - !Py $BATCH['docs']
        - !Py typhoon.data.json_array_to_json_records($1)
      path: !MultiStep 
        - !Py $BATCH['docs'][0]['key']
        - !Py f'/authors/{$1}.json'
      create_intermediate_dirs: True

Favorite Authors Getting the works of my favorite authors from Open Library API

Installation

See documentation for detailed guidance on installation and walkthroughs.

with pip (typhoon standalone)

Install typhoon:

pip install typhoon-orchestrator[dev]

Optionally, install and activate virtualenv.

Then:

typhoon init hello_world
cd hello_world
typhoon status

This will create a directory named hello_world that serves as an example project. As in git, when we cd into the directory it will detect that it's a Typhoon project and consider that directory the base directory for Typhoon (TYPHOON_HOME).

Adding connnections

You can add a default connections as follows in the cli

typhoon connection add --conn-id data_lake --conn-env local
# Check that it was added
typhoon connection ls -l

With Docker and Airflow

To deploy Typhoon with Airflow you need:

  • Docker / Docker Desktop (You must use WSL2 on Windows)
  • Download the [docker-compose.yaml][1] (or use curl below)
  • Create a directory for your TYPHOON_PROJECTS_HOME

The following sets up your project directory and gets the docker-compose.yml:

TYPHOON_PROJECTS_HOME="/tmp/typhoon_projects" # Or any other path you prefer
mkdir -p $TYPHOON_PROJECTS_HOME/typhoon_airflow_test
cd $TYPHOON_PROJECTS_HOME/typhoon_airflow_test
mkdir src
curl -LfO https://raw.githubusercontent.com/typhoon-data-org/typhoon-orchestrator/master/docker-compose-af.yml

docker compose -f docker-compose-af.yml up -d  
docker-compose -f docker-compose-af.yml run --rm typhoon-af airflow initdb
docker-compose -f docker-compose-af.yml run --rm typhoon-af typhoon status
docker-compose -f docker-compose-af.yml run --rm typhoon-af typhoon connection add --conn-id data_lake --conn-env local  # Adding our first connection!
docker-compose -f docker-compose-af.yml run --rm typhoon-af typhoon dag build --all
docker restart typhoon-af # Wait while docker restarts

This runs a container with only 1 service, typhoon-af. This has both Airflow and Typhoon installed on it ready to work with.

You should be able to then check typhoon status and also the airlfow UI at http://localhost:8088

Airflow UI Typhoon DAGS listed in airflow UI

Development hints are in the docs.

Airflow Favorite Author Favorite Authors DAG - as displayed in airflow UI

We can extend the above task to give an example with more complexity. The tutorial for this has some more advanced tips. The airflow compiled DAG handles complex DAG structures very nicely:

Airflow Favorite Author Extended Favorite Authors Extended - a complex DAG example

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

typhoon-orchestrator-0.0.55.tar.gz (138.3 kB view hashes)

Uploaded source

Built Distribution

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page