Skip to main content

Create asynchronous data pipelines and deploy to cloud or airflow

Project description





Website :loudspeaker: | Discord :sunglasses: | Forum :wave: | Installation :floppy_disk: | Documentation :notebook:

Linux

Why Typhoon?

Our vision is a new generation of cloud native, asynchronous orchestrators that can handle highly dynamic workflows with ease. We crafted Typhoon from the ground up to work towards this vision. It's designed to feel familiar while still making very different design decisions where it matters.

Typhoon overview montage

Typhoon + AWS Lambda

A Serverless orchestrator has the potential to be infinitely scalable and extremely cost efficient at the same time. We think AWS Lambda is ideal for this:

  • CloudWatch Events can trigger a Lambda on a schedule, so we get scheduling for free! A scheduler is the most complex piece of an orchestrator. We can do away with it completely and still be sure that our DAGs will always run on time.
  • Lambda is cheap. You get 1 million invocations for free every month.
  • Workflows can be paralellized by running tasks in parallel on different instances of the Lambda. Typhoon DAGs use batching to take full advantage of this.

Typhoon + Airflow

Airflow is great!

It's also the industry standard and will be around for a while. However, we think it can be improved, without even migrating your existing production code.

Typhoon lets you write Airflow DAGS faster :rocket::

**Workflow**: Typhoon YAML DAG --> Typhoon build --> Airflow DAG 

Simplicity and re-usability; a toolkit designed to be loved by Data Engineers :heart:

Key features

  • Pure python - Easily extend with pure python. Frameworkless, with no dependencies.
  • Testable Python - Write tests for your tasks in PyTest. Automate DAG testing.
  • Composability - Functions and connections combine like Lego. Very easy to extend.
  • Data sharing - data flows between tasks making it intuitive to build tasks.
  • Elegant: YAML - low-code and easy to learn.
  • Code-completion - Fast to compose. (VS Code recommended).
  • Components - reduce complex tasks (e.g. CSV → S3 → Snowflake) to 1 re-usable task.
  • Components UI - Share your pre-built automation with your team. teams. :raised_hands:
  • Rich Cli & Shell - Inspired by other great command line interfaces and instantly familiar. Intelligent bash/zsh completion.
  • Flexible deployment - Deploy to Airflow. Large reduction in effort, without breaking existing production.

Example YAML DAG

name: favorite_authors
schedule_interval: rate(1 day)

tasks:
  choose_favorites:
    function: typhoon.flow_control.branch
    args:
      branches:
        - J. K. Rowling
        - George R. R. Martin
        - James Clavell

  get_author:
    input: choose_favorites
    function: functions.open_library_api.get_author
    args:
      author: !Py $BATCH

  write_author_json:
    input: get_author
    function: typhoon.filesystem.write_data    
    args:
      hook: !Hook data_lake
      data:  !MultiStep
        - !Py $BATCH['docs']
        - !Py typhoon.data.json_array_to_json_records($1)
      path: !MultiStep 
        - !Py $BATCH['docs'][0]['key']
        - !Py f'/authors/{$1}.json'
      create_intermediate_dirs: True

Favorite Authors Getting the works of my favorite authors from Open Library API

⚡ Installation

See documentation for more extensive installation instructions and walkthroughs.

with pip (typhoon standalone)

Install typhoon:

pip install typhoon-orchestrator[dev]

# Create a project
typhoon init hello_world

# Try the Cli
cd hello_world
typhoon status

# Add your connection
typhoon connection add --conn-id data_lake --conn-env local
typhoon connection ls -l

Docs: Detailed local installation instructions. | Hello world.

With Docker and Airflow

To deploy Typhoon with Airflow you need:

  • Docker / Docker Desktop (For now, you must use Gitbash on Windows. Currently, there is an open issue on WSL2.)
  • Download the [docker-compose.yaml][1] (or use curl below)
  • Create a directory for your TYPHOON_PROJECTS_HOME

The following sets up your project directory and gets the docker-compose.yml:

TYPHOON_PROJECTS_HOME="/tmp/typhoon_projects" # Or any other path you prefer
mkdir -p $TYPHOON_PROJECTS_HOME/typhoon_airflow_test
cd $TYPHOON_PROJECTS_HOME/typhoon_airflow_test

# For Windows WSL2 Users - for other env. its optional 
sudo chown -R $USER: $TYPHOON_PROJECTS_HOME/typhoon_airflow_test
mkdir airflow
mkdir data_lake
mkdir src

curl -LfO https://raw.githubusercontent.com/typhoon-data-org/typhoon-orchestrator/master/docker-compose-af.yml

!!! Important On Windows Gitbash please run each docker-compose run one by one. They are quick.

docker-compose -f docker-compose-af.yml run --rm typhoon-af airflow initdb
docker-compose -f docker-compose-af.yml run --rm typhoon-af typhoon status
docker-compose -f docker-compose-af.yml run --rm typhoon-af typhoon connection add --conn-id data_lake --conn-env local  # Adding our first connection!
docker-compose -f docker-compose-af.yml run --rm typhoon-af typhoon dag build --all
docker compose -f docker-compose-af.yml up -d

This runs a container with only 1 service, typhoon-af. This has both Airflow and Typhoon installed on it ready to work with.

You should be able to then check typhoon status and also the airlfow UI at http://localhost:8088

Docs: Detailed docker installation instructions. | Development hints.


![Airflow UI](docs/img/airflow_ui_list_after_install.png) *Typhoon DAGS listed in airflow UI*

Airflow Favorite Author Favorite Authors DAG - as displayed in airflow UI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

typhoon-orchestrator-0.0.58.tar.gz (134.5 kB view hashes)

Uploaded Source

Built Distribution

typhoon_orchestrator-0.0.58-py3-none-any.whl (185.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page