Auto-deploy the Takeoff Server on AWS for LLM inference
Project description
Horizon Takeoff
Horizon Takeoff is a Python library for simplifying the cloud deployment of LLMs with TitanML's Takeoff Server on AWS, with a specific focus on EC2 and SageMaker. The deployment process is facilitated through an interactive Terminal User Interface (TUI) for streamlining the configuration of your cloud environment. To gain a deeper understanding of the features offered by the Takeoff Server, refer to TitanML's documentation.
With Horizon-Takeoff, you have the flexibility to choose between two distinct workflows:
1. Terminal User Interface (TUI): This approach guides you through a step-by-step process within the terminal. This procedure automatically saves your cloud environment settings in a YAML file and handles cloud orchestration tasks such as handling of the Takeoff Server image to AWS's Elastic Container Registry (ECR), initiating the instance launch and Takeoff Server configuration for LLM inference.
2. Python API: Alternatively, you can can manually create the YAML config file according to your specific requirements and execute the orchestration and instance launch in Python. Further details found in the YAML Configuration
section.
Requirements
1. AWS CLI installed and configured on local machine.
2. Docker installed.
3. Own an AWS account with the following configurations:
-
Have an instance profile role with access to
AmazonEC2ContainerRegistryReadOnly
. This will allow access to Docker pulls from ECR within an instance. -
Own a security group allowing inbound traffic to
port: 8000
(required for Takeoff Server community edition) andport: 3000
(required for Takeoff Server pro edition). This will expose the appropriate Docker endpoints for API calling depending on your server edition of choice.
Currently, only EC2 instance deployment on the Community edition server is stable, Sagemaker and/or Takeoff Server Pro edition is under development.
Install
pip install horizon-takeoff
TUI Launch
Launch the TUI for configuring an EC2 instance with the community version of the Takeoff Server:
horizon-takeoff ec2 community
The TUI not only features a clean user interface (powered by the rich
library) but also conducts pre-start checks for AWS CLI and Docker installations. Furthermore, it has access to your AWS profile data, allowing it to significantly expedite the configuration of your AWS cloud environment. This includes the ability to list your available AWS keys, security groups, and ARN roles.
Staging
After you've finished the TUI workflow, a YAML configuration file will be automatically stored in your working directory. This file will trigger the staging process of your deployment and you will receive a notification in terminal of your instance launch.
Wait a few minutes as the instance downloads the LLM model and initiates the Docker container containing the Takeoff Server. To keep track of the progress and access your instance's initialization logs, you can SSH into your instance:
ssh -i ~/<pem.key> <user>@<public-ipv4-dns> # e.g. ssh -i ~/aws.pem ubuntu@ec2-44-205-255-59.compute-1.amazonaws.com
In your instance's terminal, run the following command to view your logs to confirm when your container is up and running:
cat /var/log/cloud-init-output.log
If you observe the Uvicorn URL endpoint being displayed, it signifies that your Docker container is operational and you are now ready to invoke API calls to the inference endpoint.
Calling the Endpoint
Once you've initialized the EC2Endpoint class, you can effortlessly invoke your LLM in the cloud with just a single line of code.
from horizon import EC2Endpoint
endpoint = EC2Endpoint()
generation = endpoint('List 3 things to do in London.')
print(generation)
You can pass generative arguments to the EC2Endpoint()
class in order to shape your model's output and/or choose server edition and endpoint type:
pro: bool = False,
stream: bool = False,
sampling_topk: int = 1,
sampling_topp: float = 1.0,
sampling_temperature: float = 1.0,
repetition_penalty: int = 1,
no_repeat_ngram_size: int = 0,
For more information regarding the available decoding arguments, refer to TitanML's docs.
Deleting Instance
To delete your working instance via the terminal, run:
horizon-del
YAML Configuration
If you prefer to bypass the TUI, you can enter your YAML configuration manually. Make sure to add the following EC2-related variables and save them in a ec2_config.yaml
file:
EC2:
ami_id: ami-0c7217cdde317cfec # Set the ID of the Amazon Machine Image (AMI) to use for EC2 instances.
ecr_repo_name: takeoff # Set the name of the ECR repository. If it doesn't exist it will be created.
hardware: cpu # Set the hardware type: 'cpu' or 'gpu'
hf_model_name: tiiuae/falcon-7b-instruct # Set the name of the Hugging Face model to use.
instance_role_arn: arn:aws:iam::^^^:path # Set the ARN of the IAM instance profile role.
instance_type: c5.2xlarge # Set the EC2 instance type.
key_name: aws # Set the name of the AWS key pair.
region_name: us-east-1 # Set the AWS region name.
security_group_ids: # Set the security group ID(s).
- sg-0fefe7b366b0c0843
server_edition: community # defaults to "community" ("pro" not available yet)
Launch in Python
Upon configuring the YAML file, instantiate the DockerHandler
and TitanEC2
classes to handle Docker image flows and instance launch.
Docker
Load the YAML file into the DockerHandler
. These commands will pull the Takeoff Docker image, tag it, and push it to ECR:
from horizon import DockerHandler, TitanEC2
docker = DockerHandler("ec2_config.yaml")
docker.pull_takeoff_image()
docker.push_takeoff_image()
Create Instance
Launch the EC2 instance:
titan = TitanEC2("ec2_config.yaml")
instance_id, meta_data = titan.create_instance()
print(meta_data)
When you instance is created, you will get a JSON output of the instance's meta data.
Revisit the Staging
and Calling the Inference Endpoint
section for API handling.
Delete Instance
Pass your Instance Id
to the delete_instance
method:
titan.delete_instance(instance_id)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file horizon-takeoff-0.0.4.4.tar.gz
.
File metadata
- Download URL: horizon-takeoff-0.0.4.4.tar.gz
- Upload date:
- Size: 23.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b34ec6661da933881d5fcd93fc4daa606f5e69588eef6a8c3f9f30bde6610a74 |
|
MD5 | eb8d8662f58e97de4501e8769bfc3dec |
|
BLAKE2b-256 | 095da4219c07ca306157dc50146ff8855d04bdaaaf26a1fba2ee9670ea30166f |
File details
Details for the file horizon_takeoff-0.0.4.4-py3-none-any.whl
.
File metadata
- Download URL: horizon_takeoff-0.0.4.4-py3-none-any.whl
- Upload date:
- Size: 25.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 01f9ed9f9173d54cfa689f72e7c1df6f61037af0ef96fb27f4e04c3ded87a814 |
|
MD5 | 6bc9102c078a86ddb30d5111f33e561a |
|
BLAKE2b-256 | c93f7de24717cfa76a3208ca6f2ba017f2a7538f9d43419b59f9054eb8ffdc53 |