Skip to main content

Auto-deploy the Takeoff Server on AWS for LLM inference

Project description

Horizon Takeoff

Horizon Takeoff is a Python library for simplifying the cloud deployment of LLMs with TitanML's Takeoff Server on AWS, with a specific focus on EC2 and SageMaker. The deployment process is facilitated through an interactive Terminal User Interface (TUI) for streamlining the configuration of your cloud environment. To gain a deeper understanding of the features offered by the Takeoff Server, refer to TitanML's documentation.

With Horizon-Takeoff, you have the flexibility to choose between two distinct workflows:

1. Terminal User Interface (TUI): This approach guides you through a step-by-step process within the terminal. This procedure automatically saves your cloud environment settings in a YAML file and handles cloud orchestration tasks such as handling of the Takeoff Server image to AWS's Elastic Container Registry (ECR), initiating the instance launch and Takeoff Server configuration for LLM inference.

2. Python API: Alternatively, you can can manually create the YAML config file according to your specific requirements and execute the orchestration and instance launch in Python. Further details found in the YAML Configuration section.

Requirements

1. AWS CLI installed and configured on local machine.
2. Docker installed.
3. Own an AWS account with the following configurations:

  • Have an instance profile role with access to AmazonEC2ContainerRegistryReadOnly. This will allow access to Docker pulls from ECR within an instance.

  • Own a security group allowing inbound traffic to port: 8000 (required for Takeoff Server community edition) and port: 3000 (required for Takeoff Server pro edition). This will expose the appropriate Docker endpoints for API calling depending on your server edition of choice.

Currently, only EC2 instance deployment on the Community edition server is stable, Sagemaker and/or Takeoff Server Pro edition is under development.

Install


pip install horizon-takeoff

TUI Launch


Launch the TUI for configuring an EC2 instance with the community version of the Takeoff Server:

horizon-takeoff ec2 community

The TUI not only features a clean user interface (powered by the rich library) but also conducts pre-start checks for AWS CLI and Docker installations. Furthermore, it has access to your AWS profile data, allowing it to significantly expedite the configuration of your AWS cloud environment. This includes the ability to list your available AWS keys, security groups, and ARN roles.

Staging


After you've finished the TUI workflow, a YAML configuration file will be automatically stored in your working directory. This file will trigger the staging process of your deployment and you will receive a notification in terminal of your instance launch.

Wait a few minutes as the instance downloads the LLM model and initiates the Docker container containing the Takeoff Server. To keep track of the progress and access your instance's initialization logs, you can SSH into your instance:

ssh -i ~/<pem.key> <user>@<public-ipv4-dns>  # e.g. ssh -i ~/aws.pem ubuntu@ec2-44-205-255-59.compute-1.amazonaws.com

In your instance's terminal, run the following command to view your logs to confirm when your container is up and running:

cat /var/log/cloud-init-output.log

If you observe the Uvicorn URL endpoint being displayed, it signifies that your Docker container is operational and you are now ready to invoke API calls to the inference endpoint.

Calling the Endpoint


Once you've initialized the EC2Endpoint class, you can effortlessly invoke your LLM in the cloud with just a single line of code.

from horizon import EC2Endpoint

endpoint = EC2Endpoint()
generation = endpoint('List 3 things to do in London.')
print(generation)

You can pass generative arguments to the EC2Endpoint() class in order to shape your model's output and/or choose server edition and endpoint type:

pro: bool = False,
stream: bool = False,
sampling_topk: int = 1,
sampling_topp: float = 1.0,
sampling_temperature: float = 1.0,
repetition_penalty: int = 1,
no_repeat_ngram_size: int = 0,

For more information regarding the available decoding arguments, refer to TitanML's docs.

Deleting Instance


To delete your working instance via the terminal, run:

horizon-del

YAML Configuration


If you prefer to bypass the TUI, you can enter your YAML configuration manually. Make sure to add the following EC2-related variables and save them in a ec2_config.yaml file:

EC2:
  ami_id: ami-0c7217cdde317cfec             # Set the ID of the Amazon Machine Image (AMI) to use for EC2 instances.
  ecr_repo_name: takeoff                    # Set the name of the ECR repository. If it doesn't exist it will be created.
  hardware: cpu                             # Set the hardware type: 'cpu' or 'gpu'
  hf_model_name: tiiuae/falcon-7b-instruct  # Set the name of the Hugging Face model to use.
  instance_role_arn: arn:aws:iam::^^^:path  # Set the ARN of the IAM instance profile role.
  instance_type: c5.2xlarge                 # Set the EC2 instance type.
  key_name: aws                             # Set the name of the AWS key pair.
  region_name: us-east-1                    # Set the AWS region name.
  security_group_ids:                       # Set the security group ID(s).
    - sg-0fefe7b366b0c0843
  server_edition: community                 # defaults to "community" ("pro" not available yet)                

Launch in Python


Upon configuring the YAML file, instantiate the DockerHandler and TitanEC2 classes to handle Docker image flows and instance launch.

Docker

Load the YAML file into the DockerHandler. These commands will pull the Takeoff Docker image, tag it, and push it to ECR:

from horizon import DockerHandler, TitanEC2

docker = DockerHandler("ec2_config.yaml")

docker.pull_takeoff_image()
docker.push_takeoff_image()

Create Instance

Launch the EC2 instance:

titan = TitanEC2("ec2_config.yaml")
instance_id, meta_data = titan.create_instance()
print(meta_data)

When you instance is created, you will get a JSON output of the instance's meta data.

Revisit the Staging and Calling the Inference Endpoint section for API handling.

Delete Instance

Pass your Instance Id to the delete_instance method:

titan.delete_instance(instance_id)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

horizon-takeoff-0.0.4.4.tar.gz (23.3 kB view details)

Uploaded Source

Built Distribution

horizon_takeoff-0.0.4.4-py3-none-any.whl (25.1 kB view details)

Uploaded Python 3

File details

Details for the file horizon-takeoff-0.0.4.4.tar.gz.

File metadata

  • Download URL: horizon-takeoff-0.0.4.4.tar.gz
  • Upload date:
  • Size: 23.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for horizon-takeoff-0.0.4.4.tar.gz
Algorithm Hash digest
SHA256 b34ec6661da933881d5fcd93fc4daa606f5e69588eef6a8c3f9f30bde6610a74
MD5 eb8d8662f58e97de4501e8769bfc3dec
BLAKE2b-256 095da4219c07ca306157dc50146ff8855d04bdaaaf26a1fba2ee9670ea30166f

See more details on using hashes here.

File details

Details for the file horizon_takeoff-0.0.4.4-py3-none-any.whl.

File metadata

File hashes

Hashes for horizon_takeoff-0.0.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 01f9ed9f9173d54cfa689f72e7c1df6f61037af0ef96fb27f4e04c3ded87a814
MD5 6bc9102c078a86ddb30d5111f33e561a
BLAKE2b-256 c93f7de24717cfa76a3208ca6f2ba017f2a7538f9d43419b59f9054eb8ffdc53

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page