Skip to main content

Mage is a tool for building and deploying data pipelines.

Project description

Mage

🧙 A modern replacement for Airflow.

Documentation   🌪️    Watch 2 min demo   🌊    Play with live tool   🔥    Get instant help

Give your data team magical powers

Integrate and synchronize data from 3rd party sources

Build real-time and batch pipelines to transform data using Python, SQL, and R

Run, monitor, and orchestrate thousands of pipelines without losing sleep


1️⃣ 🏗️

Build

Have you met anyone who said they loved developing in Airflow?
That’s why we designed an easy developer experience that you’ll enjoy.

Easy developer experience
Start developing locally with a single command or launch a dev environment in your cloud using Terraform.

Language of choice
Write code in Python, SQL, or R in the same data pipeline for ultimate flexibility.

Engineering best practices built-in
Each step in your pipeline is a standalone file containing modular code that’s reusable and testable with data validations. No more DAGs with spaghetti code.

2️⃣ 🔮

Preview

Stop wasting time waiting around for your DAGs to finish testing.
Get instant feedback from your code each time you run it.

Interactive code
Immediately see results from your code’s output with an interactive notebook UI.

Data is a first-class citizen
Each block of code in your pipeline produces data that can be versioned, partitioned, and cataloged for future use.

Collaborate on cloud
Develop collaboratively on cloud resources, version control with Git, and test pipelines without waiting for an available shared staging environment.

3️⃣ 🚀

Launch

Don’t have a large team dedicated to Airflow?
Mage makes it easy for a single developer or small team to scale up and manage thousands of pipelines.

Fast deploy
Deploy Mage to AWS, GCP, or Azure with only 2 commands using maintained Terraform templates.

Scaling made simple
Transform very large datasets directly in your data warehouse or through a native integration with Spark.

Observability
Operationalize your pipelines with built-in monitoring, alerting, and observability through an intuitive UI.

🧙 Intro

Mage is an open-source data pipeline tool for transforming and integrating data.

  1. Quick start
  2. Demo
  3. Tutorials
  4. Documentation
  5. Features
  6. Core design principles
  7. Core abstractions
  8. Contributing

🏃‍♀️ Quick start

You can install and run Mage using Docker (recommended), pip, or conda.

Install using Docker

  1. Create a new project and launch tool (change demo_project to any other name if you want):

    docker run -it -p 6789:6789 -v $(pwd):/home/src mageai/mageai \
      /app/run_app.sh mage start demo_project
    

    Want to use Spark or other integrations? Read more about integrations.

  2. Open http://localhost:6789 in your browser and build a pipeline.

Using pip or conda

  1. Install Mage

    pip install mage-ai
    

    or

    conda install -c conda-forge mage-ai
    

    For additional packages (e.g. spark, postgres, etc), please see Installing extra packages.

    If you run into errors, please see Install errors.

  2. Create new project and launch tool (change demo_project to any other name if you want):

    mage start demo_project
    
  3. Open http://localhost:6789 in your browser and build a pipeline.


🎮 Demo

Live demo

Build and run a data pipeline with our demo app.

WARNING

The live demo is public to everyone, please don’t save anything sensitive (e.g. passwords, secrets, etc).

Demo video (2 min)

Mage quick start demo

Click the image to play video


👩‍🏫 Tutorials

Fire mage

🔮 Features

🎶 Orchestration Schedule and manage data pipelines with observability.
📓 Notebook Interactive Python, SQL, & R editor for coding data pipelines.
🏗️ Data integrations Synchronize data from 3rd party sources to your internal destinations.
🚰 Streaming pipelines Ingest and transform real-time data.
DBT Build, run, and manage your DBT models with Mage.

A sample data pipeline defined across 3 files ➝

  1. Load data ➝
    @data_loader
    def load_csv_from_file():
        return pd.read_csv('default_repo/titanic.csv')
    
  2. Transform data ➝
    @transformer
    def select_columns_from_df(df, *args):
        return df[['Age', 'Fare', 'Survived']]
    
  3. Export data ➝
    @data_exporter
    def export_titanic_data_to_disk(df) -> None:
        df.to_csv('default_repo/titanic_transformed.csv')
    

What the data pipeline looks like in the UI ➝

data pipeline overview

New? We recommend reading about blocks and learning from a hands-on tutorial.

Ask us questions on Slack


🏔️ Core design principles

Every user experience and technical design decision adheres to these principles.

💻 Easy developer experience Open-source engine that comes with a custom notebook UI for building data pipelines.
🚢 Engineering best practices built-in Build and deploy data pipelines using modular code. No more writing throwaway code or trying to turn notebooks into scripts.
💳 Data is a first-class citizen Designed from the ground up specifically for running data-intensive workflows.
🪐 Scaling is made simple Analyze and process large data quickly for rapid iteration.

🛸 Core abstractions

These are the fundamental concepts that Mage uses to operate.

Project Like a repository on GitHub; this is where you write all your code.
Pipeline Contains references to all the blocks of code you want to run, charts for visualizing data, and organizes the dependency between each block of code.
Block A file with code that can be executed independently or within a pipeline.
Data product Every block produces data after it's been executed. These are called data products in Mage.
Trigger A set of instructions that determine when or how a pipeline should run.
Run Stores information about when it was started, its status, when it was completed, any runtime variables used in the execution of the pipeline or block, etc.

🙋‍♀️ Contributing and developing

Add features and instantly improve the experience for everyone.

Check out the contributing guide to setup your development environment and start building.


👨‍👩‍👧‍👦 Community

Individually, we’re a mage.

🧙 Mage

Magic is indistinguishable from advanced technology. A mage is someone who uses magic (aka advanced technology). Together, we’re Magers!

🧙‍♂️🧙 Magers (/ˈmājər/)

A group of mages who help each other realize their full potential! Let’s hang out and chat together ➝

Hang out on Slack

For real-time news, fun memes, data engineering topics, and more, join us on ➝

Twitter Twitter
LinkedIn LinkedIn
GitHub GitHub
Slack Slack

🤔 Frequently Asked Questions (FAQs)

Check out our FAQ page to find answers to some of our most asked questions.


🪪 License

See the LICENSE file for licensing information.

Water mage casting spell


Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mage-ai-0.7.82.tar.gz (6.3 MB view details)

Uploaded Source

Built Distribution

mage_ai-0.7.82-py3-none-any.whl (6.6 MB view details)

Uploaded Python 3

File details

Details for the file mage-ai-0.7.82.tar.gz.

File metadata

  • Download URL: mage-ai-0.7.82.tar.gz
  • Upload date:
  • Size: 6.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for mage-ai-0.7.82.tar.gz
Algorithm Hash digest
SHA256 5e88ac7292b51064b6da27574155738bcedff3e19ed1962cfbeaf68340538312
MD5 9f223be5e3a1ac3ccba861a9126e9cdf
BLAKE2b-256 6caecf6ad11fc32be67a78cfad33399eae7d358b6de41080b26dfbb8d2bae634

See more details on using hashes here.

File details

Details for the file mage_ai-0.7.82-py3-none-any.whl.

File metadata

  • Download URL: mage_ai-0.7.82-py3-none-any.whl
  • Upload date:
  • Size: 6.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for mage_ai-0.7.82-py3-none-any.whl
Algorithm Hash digest
SHA256 32928a4e8321c18015f3e87605bcb2e8105039c0e9cc701581d28bd82319baa9
MD5 7f8b1c3875535d49e8f124235cf611fd
BLAKE2b-256 d92e0e605ef61467764b3a778c7001f4b4aa699b6557fad8459a461339937d89

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page