Mage - An open-source data management platform that helps you clean data and prepare it for training AI/ML models
Project description
Intro
Mage is an open-source data management platform that helps you clean data and prepare it for training AI/ML models.
Join us on Slack
What does this do?
The current version of Mage includes a data cleaning UI tool that can run locally on your laptop or can be hosted in your own cloud environment.
Why should I use it?
Using a data cleaning tool enables you to quickly visualize data quality issues, easily fix them, and create repeatable data cleaning pipelines that can be used in production environments (e.g. online re-training, inference, etc).
Table of contents
Quick start
- Try a demo of Mage in Google Colab.
- Try a hosted version of Mage
Install library
Install the most recent released version:
$ pip install mage-ai
Launch tool
Load your data, connect it to Mage, and launch the tool locally.
From anywhere you can execute Python code (e.g. terminal, Jupyter notebook, etc.), run the following:
import mage_ai
from mage_ai.server.sample_datasets import load_dataset
df = load_dataset('titanic_survival.csv')
mage_ai.connect_data(df, name='titanic dataset')
mage_ai.launch()
Open http://localhost:5789 in your browser to access the tool locally.
To stop the tool, run this command: mage_ai.kill()
Custom host and port for tool
If you want to change the default host (localhost
) and the default port (5789
)
that the tool runs on, you can set 2 separate environment variables:
$ export HOST=127.0.0.1
$ export PORT=1337
Using tool in Jupyter notebook cell
You can run the tool inside a Jupyter notebook cell iFrame using the method:
mage_ai.launch()
within a single cell.
Optionally, you can use the following arguments to change the default host and port that the iFrame loads from:
mage_ai.launch(iframe_host='127.0.0.1', iframe_port=1337)
Cleaning data
After building a data cleaning pipeline from the UI, you can clean your data anywhere you can execute Python code:
import mage_ai
from mage_ai.server.sample_datasets import load_dataset
df = load_dataset('titanic_survival.csv')
# Option 1: Clean with pipeline uuid
df_cleaned = mage_ai.clean(df, pipeline_uuid='uuid_of_cleaning_pipeline')
# Option 2: Clean with pipeline config directory path
df_cleaned = mage_ai.clean(df, pipeline_config_path='/path_to_pipeline_config_dir')
Demo video (2 min)
More resources
Here is a 🗺️ step-by-step guide on how to use the tool.
Check out the 📚 tutorials to quickly become a master of magic.
Features
Data visualizations
Inspect your data using different charts (e.g. time series, bar chart, box plot, etc.).
Here’s a list of available charts.
Reports
Quickly diagnose data quality issues with summary reports.
Here’s a list of available reports.
Cleaning actions
Easily add common cleaning functions to your pipeline with a few clicks. Cleaning actions include imputing missing values, reformatting strings, removing duplicates, and many more.
If a cleaning action you need doesn’t exist in the library, you can write and save custom cleaning functions in the UI.
Here’s a list of available cleaning actions.
Data cleaning suggestions
The tool will automatically suggest different ways to clean your data and improve quality metrics.
Here’s a list of available suggestions.
Roadmap
Big features being worked on or in the design phase.
- Encoding actions (e.g. one-hot encoding, label hasher, ordinal encoding, embeddings, etc.)
- Data quality monitoring and alerting
- Apply cleaning actions to columns and values that match a condition
Here’s a detailed list of 🪲 features and bugs that are in progress or upcoming.
Contributing
We welcome all contributions to Mage; from small UI enhancements to brand new cleaning actions. We love seeing community members level up and give people power-ups!
Check out the 🎁 contributing guide to get started by setting up your development environment and exploring the code base.
Got questions? Live chat with us in Slack
Anything you contribute, the Mage team and community will maintain. We’re in it together!
Community
We love the community of Magers (/ˈmājər/
);
a group of mages who help each other realize their full potential!
To live chat with the Mage team and community, please join the free Mage Slack channel.
For real-time news and fun memes, check out the Mage Twitter.
To report bugs or add your awesome code for others to enjoy, visit GitHub.
License
See the LICENSE file for licensing information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mage-ai-0.0.5.tar.gz
.
File metadata
- Download URL: mage-ai-0.0.5.tar.gz
- Upload date:
- Size: 4.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 934c0e2ca9370eaf8736c7df90e359b8181cd7c9b1cd1b3d05f5d684bb268c38 |
|
MD5 | e7d6a5a271eb6e308d0d42c62a8f41d3 |
|
BLAKE2b-256 | 7335b5b418808af5efab946f8e629cca42151afd3b3b883e21537bebde1e38b0 |
File details
Details for the file mage_ai-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: mage_ai-0.0.5-py3-none-any.whl
- Upload date:
- Size: 4.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae78e6f3bd73b7e50b16d8d7124acb21cb5b33d3ce1af5ac5bcdc8c708fbb4db |
|
MD5 | d38ffbcb279b4017a24b499576ad5da0 |
|
BLAKE2b-256 | 33861e09bf2ce0ec9f0dcd774d2814dc15747c6b48f3633ecf08158db5b7d3a1 |