Default template for PDM package
Project description
Magemaker v0.1, by SlashML
Deploy open source AI models to AWS in minutes.
Table of Contents
About Magemaker
Magemaker is a Python tool that simplifies the process of deploying an open source AI model to your own cloud. Instead of spending hours digging through documentation to figure out how to get AWS working, Magemaker lets you deploy open source AI models directly from the command line.
Choose a model from Hugging Face or SageMaker, and Magemaker will spin up a SageMaker instance with a ready-to-query endpoint in minutes.
Getting Started
Magemaker works with AWS. Azure and GCP support are coming soon!
To get a local copy up and running follow these simple steps.
Prerequisites
- Python
- An AWS account
- Quota for AWS SageMaker instances (by default, you get 2 instances of ml.m5.xlarge for free)
- Certain Hugging Face models (e.g. Llama2) require an access token (hf docs)
Configuration
Step 1: Set up AWS and SageMaker
To get started, you’ll need an AWS account which you can create at https://aws.amazon.com/. Then you’ll need to create access keys for SageMaker.
We wrote up the steps in Google Doc as well.
Installing the package
Step 1
pip install magemaker
Step 2: Running magemaker
Run it by simply doing the following:
magemaker
If this is your first time running this command. It will configure the AWS client so you’re ready to start deploying models. You’ll be prompted to enter your Access Key and Secret here. You can also specify your AWS region. The default is us-east-1. You only need to change this if your SageMaker instance quota is in a different region.
Once configured, it will create a .env
file and save the credentials there. You can also add your Hugging Face Hub Token to this file if you have one.
HUGGING_FACE_HUB_KEY="KeyValueHere"
Using Magemaker
Deploying models from dropdown
When you run magemaker
comamnd it will give you an interactive menu to deploy models. You can choose from a dropdown of models to deploy.
Deploying Hugging Face models
If you're deploying with Hugging Face, copy/paste the full model name from Hugging Face. For example, google-bert/bert-base-uncased
. Note that you’ll need larger, more expensive instance types in order to run bigger models. It takes anywhere from 2 minutes (for smaller models) to 10+ minutes (for large models) to spin up the instance with your model.
Deploying Sagemaker models
If you are deploying a Sagemaker model, select a framework and search from a model. If you a deploying a custom model, provide either a valid S3 path or a local path (and the tool will automatically upload it for you). Once deployed, we will generate a YAML file with the deployment and model in the CONFIG_DIR=.magemaker_config
folder. You can modify the path to this folder by setting the CONFIG_DIR
environment variable.
Deploy using a yaml file
We recommend deploying through a yaml file for reproducability and IAC. From the cli, you can deploy a model without going through all the menus. You can even integrate us with your Github Actions to deploy on PR merge. Deploy via YAML files simply by passing the --deploy
option with local path like so:
magemaker --deploy .magemaker_config/bert-base-uncased.yaml
Following is a sample yaml file for deploying a model the same google bert model mentioned above:
deployment: !Deployment
destination: aws
# Endpoint name matches model_id for querying atm.
endpoint_name: test-bert-uncased
instance_count: 1
instance_type: ml.m5.xlarge
models:
- !Model
id: google-bert/bert-base-uncased
source: huggingface
Following is a yaml file for deploying a llama model from HF:
deployment: !Deployment
destination: aws
endpoint_name: test-llama2-7b
instance_count: 1
instance_type: ml.g5.12xlarge
num_gpus: 4
# quantization: bitsandbytes
models:
- !Model
id: meta-llama/Meta-Llama-3-8B-Instruct
source: huggingface
predict:
temperature: 0.9
top_p: 0.9
top_k: 20
max_new_tokens: 250
If you’re using the ml.m5.xlarge
instance type, here are some small Hugging Face models that work great:
Model: google-bert/bert-base-uncased
- Type: Fill Mask: tries to complete your sentence like Madlibs
- Query format: text string with
[MASK]
somewhere in it that you wish for the transformer to fill
Model: sentence-transformers/all-MiniLM-L6-v2
- Type: Feature extraction: turns text into a 384d vector embedding for semantic search / clustering
- Query format: "type out a sentence like this one."
Deactivating models
Any model endpoints you spin up will run continuously unless you deactivate them! Make sure to delete endpoints you’re no longer using so you don’t keep getting charged for your SageMaker instance.
What we're working on next
- More robust error handling for various edge cases
- Verbose logging
- Enabling / disabling autoscaling
- Deployment to Azure and GCP
Known issues
- Querying within Magemaker currently only works with text-based model - doesn’t work with multimodal, image generation, etc.
- Deleting a model is not instant, it may show up briefly after it was queued for deletion
- Deploying the same model within the same minute will break
License
Distributed under the Apache 2.0 License. See LICENSE
for more information.
Contact
You can reach us, faizan & jneid, at support@slashml.com.
We’d love to hear from you! We’re excited to learn how we can make this more valuable for the community and welcome any and all feedback and suggestions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file magemaker-0.1.0.tar.gz
.
File metadata
- Download URL: magemaker-0.1.0.tar.gz
- Upload date:
- Size: 26.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5fd33c7f411fcaed28eabc1b62ba1d8eb6d17538476ccc315b063266ce08f13c |
|
MD5 | 4fdbb094136ec2649ee00b13eec48089 |
|
BLAKE2b-256 | 126a67f4ecf3ce5c0e92839249177a88840f55b3290e3be2763ed6aa8017f7c3 |
File details
Details for the file magemaker-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: magemaker-0.1.0-py3-none-any.whl
- Upload date:
- Size: 31.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 780d9455e771f40c300b5a802fb110c22fdf1da805a48299d56b46c1b4608119 |
|
MD5 | eaf9d6677a3164f6ceceb6738dfbff08 |
|
BLAKE2b-256 | 398c9edd1f2000a071482a2ba55e9490870b4caa53a0b44ad5bf18fd099c737e |