Skip to main content

A CLI/SDK which automatically uploads pip packages and directories to aws efs to be used in aws lambda

Project description

🚀 efsync2 - Open-Source MLOps tool for running serverless machine learning

Downloads Open In Colab pypi package deployment PyPI version

efsync2 is an updated fork from Philipp Schmid's efsync tool. I noticed that there were errors rendering the code non-functional for deployment, as specified in This issue. Fortunately, Chi W Pak generated a fix and a Pull Request which would fix it! Unfortunately, as of this writing, these issues and PR have been outstanding for over six months. I wanted to share deployable serverless machine learning inferences running purely on Lambda, and wanted to include functioning packages and code in the instructions. So, I implemented Chi's fix into a fork of Philipp's code, and am releasing it as its own package for future use -- efsync2!

The vast majority of work I attribute to Philipp, and I keep the license the same. I wanted to keep as much functionality as possible the same, with identical function calls from his work, in case anyone was looking to

efsync2 is a CLI/SDK tool, which automatically syncs files and dependencies to AWS EFS. The CLI is easy to use, you only need access to an AWS Account, an AWS EFS-filesystem up and running. Philipp wrote an article on the original efsync here, which currently covers the same function calls and functionality.

I recommend starting with Quick Start. Efsync2 enables you to install dependencies with the AWS Lambda runtime directly into your EFS filesystem and use them in your AWS Lambda function. It enables you either combine this with syncing files from S3 or uploading them with SCP. You can also sync files from S3 and upload with SCP without installing Pip dependencies.

There are several examples for many usecases.

Installation and Basic Useage:

#Install via pip3:
pip3 install efsync2

#Sync your pip packages or files to AWS EFS:
efsync2 -cf efsync.yaml

Outline

🏃🏻‍♂️ Quick Start

Example in Google Colab. Open In Colab

  1. Install via pip3
pip3 install efsync2
  1. sync your pip dependencies or files to AWS EFS

usage with the cli

efsync2 -cf efsync.yaml

or with python

from efsync2 import efsync

efsync('efsync.yaml')

⚙️ Configurations

There are 4 different ways to use efsync2 in your project. You can create a yaml configuration and use the SDK, you can create a python dict and use the SDK, you can create a yaml configuration and use the CLI, or you can use the CLI with parameters. Below you can find examples for each of these. I also included afterwards configuration examples for the different use cases.

Note: If you sync file with scp from local directory (e.g. model/bert) to efs (my_efs_model) efsync will sync the model to (my_efs_model/bert) that happens because scp uploads the files recursively.

Configuration with yaml file efsync.yaml

#standard configuration
efs_filesystem_id: fs-2adfas123 # aws efs filesystem id (moint point)
subnet_Id: subnet-xxx # subnet of which the efs is running in
ec2_key_name: efsync2-asd913fjgq3 # required key name for starting the ec2 instance
clean_efs: all # Defines if the EFS should be cleaned up before. values: `'all'`,`'pip'`,`'file'` uploading
# aws profile configuration
aws_profile: efsync2 # aws iam profile with required permission configured in .aws/credentials
aws_region: eu-central-1 # the aws region where the efs is running

# pip dependencies configurations
efs_pip_dir: lib # pip directory on ec2
python_version: 3.8 # python version used for installing pip dependencies -> should be used as lambda runtime afterwads
requirements: requirements.txt # path + file to requirements.txt which holds the installable pip dependencies

# s3 config
s3_bucket: my-bucket-with-files # s3 bucket name from files should be downloaded
s3_keyprefix: models/bert # s3 keyprefix for the files
file_dir_on_ec2: ml # name of the directory where your file from <file_dir> will be uploaded, if you use scp it will it will be /file_dir

# upload files with scp to efs
file_dir: local_dir # extra local directory for file upload like ML models
from efsync2 import efsync

efsync('efsync.yaml')

Configuration with CLI Parameters

efsync2  --efs_filesystem_id  fs-2adfas123 \
        --subnet_Id subnet-xxx \
        --ec2_key_name efsync2-asd913fjgq3 \
        --clean_efs all \
        --aws_profile efsync2 \
        --aws_region yo-region-1 \
        --efs_pip_dir lib \
        --python_version 3.8 \
        --requirements requirements.txt \
        --s3_bucket my-bucket-with-files \
        --s3_keyprefix models/bert \
        --file_dir local_dir \
        --file_dir_on_ec2 ml

Configuration with CLI and yaml

efsync2 -cf efsync.yaml

Configuration with python dictonary

config = {
  'efs_filesystem_id': 'fs-2adfas123', # aws efs filesystem id (moint point)
  'subnet_Id': 'subnet-xxx', # subnet of which the efs is running in
  'ec2_key_name':'efsync2-asd913fjgq3',  # required key name for starting the ec2 instance
  'clean_efs': 'all', # Defines if the EFS should be cleaned up before. values: `'all'`,`'pip'`,`'file'` uploading
  'aws_profile': 'efsync2', # aws iam profile with required permission configured in .aws/credentials
  'aws_region': 'eu-central-1', # the aws region where the efs is running
  'efs_pip_dir': 'lib',  # pip directory on ec2
  'python_version': 3.8,  # python version used for installing pip dependencies -> should be used as lambda runtime afterwads
  'requirements': 'requirements.txt', # path + file to requirements.txt which holds the installable pip dependencies
  'file_dir': 'local_dir', # extra local directory for file upload like ML models
  'file_dir_on_ec2': 'ml', # name of the directory where your file from <file_dir> will be uploaded, if you use scp it will it will be /file_dir
  's3_bucket': 'my-bucket-with-files', # s3 bucket name from files should be downloaded
  's3_keyprefix': 'models/bert' # s3 keyprefix for the files
}

from efsync2 import efsync

efsync(config)

✍🏻 Usecase Configuration with yaml examples

Only installing Pip dependencies

#standard configuration
efs_filesystem_id: fs-2adfas123 # aws efs filesystem id (moint point)
subnet_Id: subnet-xxx # subnet of which the efs is running in
ec2_key_name: efsync2-asd913fjgq3 # required key name for starting the ec2 instance
clean_efs: all # Defines if the EFS should be cleaned up before. values: `'all'`,`'pip'`,`'file'` uploading
# aws profile configuration
aws_profile: efsync2 # aws iam profile with required permission configured in .aws/credentials
aws_region: eu-central-1 # the aws region where the efs is running

# pip dependencies configurations
efs_pip_dir: lib # pip directory on ec2
python_version: 3.8 # python version used for installing pip dependencies -> should be used as lambda runtime afterwads
requirements: requirements.txt # path + file to requirements.txt which holds the installable pip dependencies

Installing Pip dependencies and syncing files from s3 to efs

#standard configuration
efs_filesystem_id: fs-2226b27a # aws efs filesystem id (moint point)
subnet_Id: subnet-17f97a7d # subnet of which the efs is running in
ec2_key_name: efsync2-asd913fjgq3 # required key name for starting the ec2 instance
clean_efs: all # Defines if the EFS should be cleaned up before. values: `'all'`,`'pip'`,`'file'` uploading
# aws profile configuration
aws_profile: efsync2 # aws iam profile with required permission configured in .aws/credentials
aws_region: eu-central-1 # the aws region where the efs is running

# pip dependencies configurations
efs_pip_dir: lib # pip directory on ec2
python_version: 3.8 # python version used for installing pip dependencies -> should be used as lambda runtime afterwads
requirements: requirements.txt # path + file to requirements.txt which holds the installable pip dependencies

# s3 config
s3_bucket: efsync2-test-bucket # s3 bucket name from files should be downloaded
s3_keyprefix: distilbert # s3 keyprefix for the files
file_dir_on_ec2: ml # name of the directory where your file from <file_dir> will be uploaded, if you use scp it will it will be /file_dir

Only syncing files from s3 to efs

#standard configuration
efs_filesystem_id: fs-2226b27a # aws efs filesystem id (moint point)
subnet_Id: subnet-17f97a7d # subnet of which the efs is running in
ec2_key_name: efsync2-asd913fjgq3 # required key name for starting the ec2 instance
clean_efs: all # Defines if the EFS should be cleaned up before. values: `'all'`,`'pip'`,`'file'` uploading
# aws profile configuration
aws_profile: efsync2 # aws iam profile with required permission configured in .aws/credentials
aws_region: eu-central-1 # the aws region where the efs is running

# s3 config
s3_bucket: efsync2-test-bucket # s3 bucket name from files should be downloaded
s3_keyprefix: distilbert # s3 keyprefix for the files
file_dir_on_ec2: ml # name of the directory where your file from <file_dir> will be uploaded, if you use scp it will it will be /file_dir

Installing Pip dependencies and uploading local files with scp to efs

Note: If you sync a file with scp from local directory (e.g. model/bert) to efs (my_efs_model) efsync2 will sync the model to (my_efs_model/bert). This is due to scp's recursive copy functionality.

For example, if you set the destination path on your efs to be foldername and you are uploading foldername, the true path will be your/efs/mount/foldername/foldername when you wish to access your models.

If confused, I recommending double checking with os.walk. I am considering modifying this functionality in the future.

#standard configuration
efs_filesystem_id: fs-2226b27a # aws efs filesystem id (moint point)
subnet_Id: subnet-17f97a7d # subnet of which the efs is running in
ec2_key_name: efsync2-asd913fjgq3 # required key name for starting the ec2 instance
clean_efs: all # Defines if the EFS should be cleaned up before. values: `'all'`,`'pip'`,`'file'` uploading
# aws profile configuration
aws_profile: efsync2 # aws iam profile with required permission configured in .aws/credentials
aws_region: eu-central-1 # the aws region where the efs is running

# upload files with scp to efs
file_dir: local_dir # extra local directory for file upload like ML models
file_dir_on_ec2: ml # name of the directory where your file from <file_dir> will be uploaded, if you use scp it will it will be /file_dir

Only uploading local files with scp to efs

Note: If you sync a file with scp from local directory (e.g. model/bert) to efs (my_efs_model) efsync2 will sync the model to (my_efs_model/bert). This is due to scp's recursive copy functionality.

For example, if you set the destination path on your efs to be foldername and you are uploading foldername, the true path will be your/efs/mount/foldername/foldername when you wish to access your models.

If confused, I recommending double checking with os.walk. I am considering modifying this functionality in the future.

#standard configuration
efs_filesystem_id: fs-2226b27a # aws efs filesystem id (moint point)
subnet_Id: subnet-17f97a7d # subnet of which the efs is running in
ec2_key_name: efsync2-asd913fjgq3 # required key name for starting the ec2 instance
clean_efs: all # Defines if the EFS should be cleaned up before. values: `'all'`,`'pip'`,`'file'` uploading
# aws profile configuration
aws_profile: efsync2 # aws iam profile with required permission configured in .aws/credentials
aws_region: eu-central-1 # the aws region where the efs is running

# pip dependencies configurations
efs_pip_dir: lib # pip directory on ec2
python_version: 3.8 # python version used for installing pip dependencies -> should be used as lambda runtime afterwads
requirements: requirements.txt # path + file to requirements.txt which holds the installable pip dependencies

# upload files with scp to efs
file_dir: local_dir # extra local directory for file upload like ML models
file_dir_on_ec2: ml # name of the directory where your file from <file_dir> will be uploaded, if you use scp it will it will be /file_dir

🏗 Examples

There are several jupyter notebooks with examples, including installing pip dependencies only, installing pip dependencies and syncing files from s3 to efs, downloading only files from s3, and installing pip dependencies and uploading files from local with scp and only uploading files with scp. All examples can be run in a Google Colab Notebook.

simplest usage:

from efsync2 import efsync

efsync('efsync.yaml')

CLI Parameteres

cli_short cli_long default description
-h --help - displays all commands
-r --requirements requirements.txt path of your requirements.txt
-cf --config_file - path of your efsync.yaml
-py --python_version 3.8 Python version used to install dependencies
-epd --efs_pip_dir lib directory where the pip dependencies will be installed on efs
-efi --efs_filesystem_id - File System ID from the EFS filesystem
-ce --clean_efs - Defines if the EFS should be cleaned up before. values: 'all','pip','file' uploading
-fd --file_dir tmp directory where all other files will be placed
-fdoe --file_dir_on_ec2 tmp name of the directory where your file from <file_dir> will be uploaded, if you use scp it will it will be /file_dir
-ap --aws_profile efsync name of the used AWS profile
-ar --aws_region eu-central-1 aws region where the efs is running
-sbd --subnet_Id - subnet id of the efs
-ekn --ec2_key_name - temporary key name for the ec2 instance
-s3b --s3_bucket - s3 bucket name from where the files will be downloaded instance
-s3k --s3_keyprefix - s3 keyprefix of the directory in s3. Files will be downloaded recursively

🔗 Connect with me

Personal Website Twitter Medium LinkedIn

🏥 Contributing

If you want to contribute be sure to review the contributions guidelines.

📃 License

A copy of the License is provided in the LICENSE file in this repository.

Project details


Release history Release notifications | RSS feed

This version

1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

efsync2-1.0.tar.gz (19.7 kB view details)

Uploaded Source

Built Distribution

efsync2-1.0-py3-none-any.whl (22.0 kB view details)

Uploaded Python 3

File details

Details for the file efsync2-1.0.tar.gz.

File metadata

  • Download URL: efsync2-1.0.tar.gz
  • Upload date:
  • Size: 19.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for efsync2-1.0.tar.gz
Algorithm Hash digest
SHA256 4e6e9191d4191120f47edf78128bd74bb03ad66f2bd2facf535183e38e66d6ac
MD5 ded1ee7b48e05cabc427332a1caac3e6
BLAKE2b-256 1522955894abd6e814bc123e9d2c1dab580bef4dbd86fb7adfb43ad2213f554f

See more details on using hashes here.

File details

Details for the file efsync2-1.0-py3-none-any.whl.

File metadata

  • Download URL: efsync2-1.0-py3-none-any.whl
  • Upload date:
  • Size: 22.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for efsync2-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 857094fee0ae5b63e21b50d6545f2fa386c7e2044723efb82b0457b9730bc1cf
MD5 0e14cf2bc4d40978f33280b4dbf1ff0a
BLAKE2b-256 2c8bb93038852a28aacc521bedaf42b1f4eb57be3613d8e065921971119a309e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page