Skip to main content

Mage - An open-source data management platform that helps you clean data and prepare it for training AI/ML models

Project description

Intro

Mage is an open-source data management platform that helps you clean data and prepare it for training AI/ML models.

What does this do?

The current version of Mage includes a data cleaning UI tool that can run locally on your laptop or can be hosted in your own cloud environment.

Why should I use it?

Using a data cleaning tool enables you to quickly visualize data quality issues, easily fix them, and create repeatable data cleaning pipelines that can be used in production environments (e.g. online re-training, inference, etc).

Table of contents

  1. Quick start
  2. Features
  3. Roadmap
  4. Contributing
  5. Community

Quick start

Install library

$ pip install mage-ai

Launch tool

Load your data, connect it to Mage, and launch the tool locally.

From anywhere you can execute Python code (e.g. terminal, Jupyter notebook, etc.), run the following:

import mage_ai
import pandas as pd


df = pd.read_csv('/path_to_data')
mage_ai.connect_data(df, name='name_of_dataset')
mage_ai.launch()

Open http://localhost:5000 in your browser to access the tool locally.

To stop the tool, run this command: mage_ai.kill()

Cleaning data

After building a data cleaning pipeline from the UI, you can clean your data anywhere you can execute Python code:

import mage_ai
import pandas as pd


df = pd.read_csv('/path_to_data')

# Option 1: Clean with pipeline uuid
df_cleaned = mage_ai.clean(df, pipeline_uuid='uuid_of_cleaning_pipeline')

# Option 2: Clean with pipeline config directory path
df_cleaned = mage_ai.clean(df, pipeline_config_path='/path_to_pipeline_config_dir')

Demo video (2 min)

Mage quick start demo

More resources

Features

  1. Data visualizations
  2. Reports
  3. Cleaning actions
  4. Data cleaning suggestions

Data visualizations

Inspect your data using different charts (e.g. time series, bar chart, box plot, etc.).

Here’s a list of available charts.

dataset visualizations

Reports

Quickly diagnose data quality issues with summary reports.

Here’s a list of available reports.

dataset reports

Cleaning actions

Easily add common cleaning functions to your pipeline with a few clicks. Cleaning actions include imputing missing values, reformatting strings, removing duplicates, and many more.

If a cleaning action you need doesn’t exist in the library, you can write and save custom cleaning functions in the UI.

Here’s a list of available cleaning actions.

cleaning actions

Data cleaning suggestions

The tool will automatically suggest different ways to clean your data and improve quality metrics.

Here’s a list of available suggestions.

suggested cleaning actions

Roadmap

Big features being worked on or in the design phase.

  1. Encoding actions (e.g. one-hot encoding, label hasher, ordinal encoding, embeddings, etc.)
  2. Data quality monitoring and alerting
  3. Apply cleaning actions to columns and values that match a condition

Here’s a detailed list of 🪲 features and bugs that are in progress or upcoming.

Contributing

We welcome all contributions to Mage; from small UI enhancements to brand new cleaning actions. We love seeing community members level up and give people power-ups!

Check out the 🎁 contributing guide to get started by setting up your development environment and exploring the code base.

Got questions? Live chat with us in Slack Slack

Anything you contribute, the Mage team and community will maintain. We’re in it together!

Community

We love the community of Magers (/ˈmājər/); a group of mages who help each other realize their full potential!

To live chat with the Mage team and community, please join the free Mage Slack Slack channel.

For real-time news and fun memes, check out the Mage Twitter Twitter.

To report bugs or add your awesome code for others to enjoy, visit GitHub.

License

See the LICENSE file for licensing information.

Project details


Release history Release notifications | RSS feed

This version

0.0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mage-ai-0.0.1.tar.gz (2.9 MB view details)

Uploaded Source

Built Distribution

mage_ai-0.0.1-py3-none-any.whl (3.4 MB view details)

Uploaded Python 3

File details

Details for the file mage-ai-0.0.1.tar.gz.

File metadata

  • Download URL: mage-ai-0.0.1.tar.gz
  • Upload date:
  • Size: 2.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.2

File hashes

Hashes for mage-ai-0.0.1.tar.gz
Algorithm Hash digest
SHA256 0410db83e4b5faeb392fa5573f2063aa2506d1bd005ac16ea0a45caa3bea7e6b
MD5 e5c0a0b11d25dd9f5038374928ff1b90
BLAKE2b-256 48bbe7cf35da013c814408816a884e59c0bbc1b537621a851ca182f3fe47cdf6

See more details on using hashes here.

File details

Details for the file mage_ai-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: mage_ai-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 3.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.2

File hashes

Hashes for mage_ai-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0f0f9ee263fa607e4098d3ea884c9247f394c3bc3b1273033bc1c6f0125836c3
MD5 ba10d45730ab57dbbb8358b30aa61c5d
BLAKE2b-256 ff164d43a83ab8fa0b80ba37b03e8d319806052bf295f28644241e61dadf496c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page