Skip to main content

OpenMined PyGrid CLI for Infrastructure and cloud deployment

Project description

Deployment using PyGrid CLI

OpenMined PyGrid CLI provide the necessary support for Infrastructure and Deployment management of various PyGrid apps (Domains and Networks) to the main cloud providers (AWS, GCP, Azure) through the usage of Terraform and TerraScript integrated within the CLI.

Installing Terraform

Check Instructions here:

Cloud Providers

PyGrid CLI provides full support for the following cloud providers

  • Amazon Web Services (AWS)

  • Google Cloud Platform (GCP)

  • Microsoft Cloud Computing Services (Azure)

Getting Cloud Credentials

AWS credentials

You need to obtain your AWS IAM user credentials (which has programmatic access enabled) and add them to the following files in the home directory.

In ~/.aws/credentials.json

{
    "aws_access_key_id": "XXXXXXXXXXXXXXXXXXXX",
    "aws_secret_access_key": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
}

& in ~/.aws/credentials

[default]
aws_access_key_id=XXXXXXXXXXXXXX
aws_secret_access_key=XXXXXXXXXXXXXXXXXXXX

GCP credentials

You will need to use a GCP service account to provide authentication and obtain required credentials

Instructions for adding credentials

Or through using GCloud CLI and loggining to your GCP provider through this command:

gcloud auth application-default login

AZURE credentials

You need these keys

subscription_id = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
client_id = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
client_secret = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
tenant_id = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

and provide them to PyGrid CLI as AZURE credentials

Installing the CLI

Using pip (recommended)

pip install pygrid-cli

Manual install

To get started, install the CLI first through thess commands:

Clone PyGrid Repo:

git clone https://github.com/OpenMined/PyGrid.git

Then install the python package through pip

pip install -e .

CLI Instructions

Using the PyGrid CLI is very simple through this command pygrid deploy then following the instruction to successfully deploying a specific pygrid app, described in the commands below

List of CLI Commands

Deploy a Domain to AWS

pygrid deploy --provider aws --app domain

Deploy a Network to Azure

pygrid deploy --provider azure --app network

CLI Instructions Example

$ pygrid deploy                                                                                                                                           [7h 19m]
Welcome to OpenMined PyGrid CLI
Cloud Provider:  (AWS, GCP, AZURE) [AWS]: AWS
PyGrid App:  (Domain, Network) [Domain]: Domain
Please enter path to your  aws credentials json file [/Users/amrmkayid/.aws/credentials.json]:
Will you need to support Websockets? [y/N]: N
How many servers do you wish to deploy? (All are managed under the load balancer) [1]: 1
#1: PyGrid Domain ID: domain-demo
#1: Port number of the socket.io server [5000]:
? Please select your desired AWS region  eu-west-2
? Please select at least two availability zones. (Not sure? Select the first two)  done (3 selections)
? Please select an AWS instance category.  General Purpose Instances
? Please select your desired AWS instance.  Instance: t2.xlarge           Memory: 16.0 GB               CPUs: 4
? Please set a username for your Database  amrmkayid
? Enter a password for your Database (length > 8)  **********
? Enter the password again  **********
Your current configration are:

{
  "pygrid_root_path": "/Users/amrmkayid/.pygrid/cli",
  "output_file": "/Users/amrmkayid/.pygrid/cli/config_2021-03-30_011924.json",
  "provider": "aws",
  "app": {
    "name": "domain",
    "count": 1
  },
  "root_dir": "/Users/amrmkayid/.pygrid/apps/aws/domain",
  "serverless": false,
  "websockets": false,
  "apps": [
    {
      "id": "domain-demo",
      "port": 5000
    }
  ],
  "vpc": {
    "region": "eu-west-2",
    "av_zones": [
      "eu-west-2a",
      "eu-west-2b",
      "eu-west-2c"
    ],
    "instance_type": {
      "InstanceType": "t2.xlarge"
    }
  }
}

Continue? [y/N]:

Deployment without using the CLI

The CLI allows a good amount of flexibility to meet your deployment requirements. But if you need to do a custom/manual deployment, follow these steps.

  • Deploy a VPC, domain server, subnets, route tables, network gateways, database and connect them altogether.

After deploying the domain server, you would need to SSH into it and perform these tasks.

  • Install Terraform.
  • Clone PyGrid.
  • Set up environment variables (Eg. for amazon DATABASE_URL, CLOUD_PROVIDER, REGION, VPC_ID, PUBLIC_SUBNET_ID, PRIVATE_SUBNET_ID, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
  • Start the domain/network app using the run.sh file (found in respective app folders).
nohup ./run.sh --port 5000 --host 0.0.0.0

For better understanding of steps involved in setting up the domain, you can refer to our post-deployment scripts in the write_domain_exec_script function of the respective cloud provider classes (AWS_Serverfull, Azure, GCP) found in apps/domain/src/main/core/infrastructure/providers/.

Serverless deployment

Serverless deployment has risen in popularity in recent years for a variety of reasons:

  • Ease of deployment: You no longer need to be concerned about process management or configuring a network to accept appropriate incoming connections.
  • Auto-scaling: Serverless functions scale implicitly because of the fact that they are not running on a continuous process, and are instead “instantiated on-demand”.
  • Inexpensive: Because you’re not paying for a continuously running process, you’re only paying when the function is called, which is significantly cheaper than paying for a constantly running server.

As such, PyGrid Domains and Networks, which are simply message forwarding services and API’s with an attached SQL database, are great candidates for serverless deployment. Workers, on the other hand, due to their specific hardware requirements necessary for computation, should always be deployed in a “serverfull” (traditional) configuration. While this may sound expensive, if a Worker is deployed to a cloud provider then the data scientist has the ability to turn on and off the instance (the Worker) at their discretion.

Note: The PyGrid team wrote an initial serverless deployment setup for Domains and Networks. But, currently this functionality does not exist in a completed form. The current constraints are actually due to the fact that PySyft has a hard dependency on PyTorch at the moment. Currently it’s not possible to utilize only the class structure required for parsing messages from PySyft without having to install PyTorch as well. Serverless functions often come with file size constraints, making this impossible or unrealistic. It’s the PyGrid team’s hope that this dependency is later marked as optional, allowing the team to simply utilize the PySyft class structure inside of PyGrid, without depending on any deep learning framework functionality.

If serverless deployment becomes a reality for Domains and Networks, then it is our suggestion that you deploy Domains and Networks in a serverless configuration. This is the optimal method of deployment given cost, scaling concerns, and maintenance effort. Using this method, it could mean that a data owner would be able to host their private data in a PyGrid serverless deployment, and never need to be concerned with ongoing scale or maintenance of their deployment - truly a “set it and forget it” situation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pygrid-cli-0.5.0rc1.tar.gz (29.6 kB view hashes)

Uploaded Source

Built Distribution

pygrid_cli-0.5.0rc1-py3-none-any.whl (36.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page