No project description provided
Project description
Superglue
A CLI tool for developing, testing and deploying AWS glue jobs
VERSION 0.15.0
Superglue makes the development, troubleshooting and deployment of AWS glue jobs simple. It also makes creating a shared glue codebase easy and testable
Getting Started
pip install superglue
Create a Superglue Project
Superglue is intended to be used for a "project" or "collection" of AWS glue jobs. To create a superglue project run
superglue init
Project Structure
jobs -- all of your glue jobs live here
modules -- shared glue code lives here
notebooks -- jupyter notebooks live here
tests -- where your tests for jobs and modules go
tools -- holds the superglue makefile can be used for other scripts and snippets
makefile -- used as a master makefile. Includes tools/makefile
.env -- the environment vars needed to make superglue tick
.gitignore -- a templated .gitignore to get you started
The Superglue Job
A superglue job is a directory which contains your job's config.yml
file, as well as a main.py
entrypoint, jarfile
dependencies,
other python scripts, and deployment info about the job.
superglue new job --name <your job name>
Job Directory Structure
jobs
my_job
jars -- put your job java dependencies here (optional)
py -- put your python dependencies here (optional)
config.yml -- holds the aws glue job execution configuration
overrides.yml -- used to create multiple instances of your job (optional)
deployment.yml -- holds the job deployment configuration
main.py -- the main entry point for the glue job
.version -- file used to track diff between local and s3
Developing Your Job
Your job can now be developed using your desired IDE. At anytime you can check the packaging status
of your job by running superglue status
Locking Your Job
Before a job can be deployed, it must be locked by superglue. Once the job is locked, it will be assigned a version number which will auto-increment each time the job is locked. This number is more or less arbitrary, however it is used to keep previous versions of your job available in AWS S3.
superglue lock
Config.yml
The config.yml
file is where the base parameters for your glue job configuration will live. Any parameter that can be passed to the
boto3 glue client can be included in this config file.
Environment variable expansion is also supported. So you can do things like
ScriptLocation: s3://some-bucket-${SOME_ENV_VAR}-path
Where the value for ${SOME_ENV_VAR}
will be taken from your machine's environment.
Overrides.yml
This file is intended to make instances of your glue job with overridden properties possible and is used in the following way.
Values in config.yml
will be overridden by whatever you put in here, with the config.yml
used as the base. In this sense you can think
of this as an inheritance relationship where overrides.yml
contains the children of config.yml
overrides:
- Name: heavy_workers
NumberOfWorkers: 20
- Name: light_workers
NumberOfWorkers: 2
In the above example, 2 jobs will be created using the base parameters found in config.yml
and the overridden values of
Name:
and NumberOfWorkers:
will be used to create the deployment.yml
file.
Deploying a Superglue Job
Once you are satisfied with your glue job, you can then deploy it on AWS. Just run
superglue deploy
This will upload your code to S3 to the configured locations, as well as create the glue job definition in AWS Glue.
Superglue Modules (Shared Glue Codebase)
AWS glue allows importing python modules that have been uploaded to s3, zipped in the appropriate structure, and have been added to the --extra-py-files
argument.
Manually managing these dependencies is difficult and error-prone. superglue
allows you to create, package,
and include shared code in your glue jobs in the following way.
Create a new module
All code which is shareable across superglue jobs lives in the /modules
directory.
superglue new module --name <your module name>
Module Directory Structure
modules
my_module_code -- the parent directory for your module
my_module_name -- your code lives here. This is the zipfile's root directory
__init__.py -- required for the zip archive
.version -- used by superglue to track changes
my_module_name.zip -- zip archive used by the glue job itself
Locking a Superglue Module
Superglue modules must be locked before they can be packaged and deployed. Once locked, they will be assigned a version number which will be auto incremented. This number is arbitrary, however allows for keeping multiple versions of your modules in S3 to allow jobs which may use a previous version to still function.
superglue lock
Packaging a Superglue Module
Superglue modules need to be packaged into a zip archive before they can be used by AWS glue. To do this run
superglue package
Using a Superglue Module in a Glue Job
To include a module in your superglue job, simply add the module name to the superglue_modules
section in
your job's config.yml
file along with the version number you want to use.
superglue_module:
module_name:
version_number: <integer value version number>
This code will now be importable in your glue job.
from my_module import foo_bar # in this case, my_module is the name of the zipfile archive
The Superglue Makefile
After running superglue init
2 makefiles were created. One in the root directory makefile
and one in
tools/makefile
The makefile
in your projects root directory is intended to be used to add whatever custom automation
to your project that you like, and not clutter up the superglue makefile itself. By default, it will include
the superglue makefile at tools/makefile
From the project's root directory, you can run make help
AWS Glue Docker Image (Official AWS)
For local development and testing of your superglue jobs and modules, you can pull the official AWS glue docker image from docker hub by running
make pull
IDE Autocomplete
To allow your IDE to make auto complete suggestions, we need to include the AWS glue source code in project venv. To do this just run
make glue
This will clone the aws-glue-libs
project from github into the /tools
directory, copy the awsglue
source code into
your venv
and remove the repo once completed. Your IDE autocomplete engine should automatically index this
and make suggestions for you during local development
Superglue Testing
All superglue tests can be found in the /tests
directory. These tests get mapped into the docker container and run
by using the make test
command. All tests are run in the docker container using pytest
All import paths are included on the PYTHONPATH
automatically. All superglue jobs and modules should be
directly importable.
To tests superglue modules, they must first be packaged. Before running make test
you must run superglue package
This will ensure that the actual zip archive is being tested, which is what your glue job will actually be using.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file superglue-0.15.0.tar.gz
.
File metadata
- Download URL: superglue-0.15.0.tar.gz
- Upload date:
- Size: 21.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/37.3 requests/2.28.2 requests-toolbelt/0.10.1 urllib3/1.26.14 tqdm/4.64.1 importlib-metadata/6.0.0 keyring/23.13.1 rfc3986/2.0.0 colorama/0.4.6 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 325a51a9891193e64b69550b85ae7049f026987c3755267f961b783f0ed5f99f |
|
MD5 | 613108a3a2ba3a37fc8f30f99788ff79 |
|
BLAKE2b-256 | 183f8923c01ff29c278b0c864675f1f47bc3533726bbcaccbbb487b806ef4721 |
File details
Details for the file superglue-0.15.0-py3-none-any.whl
.
File metadata
- Download URL: superglue-0.15.0-py3-none-any.whl
- Upload date:
- Size: 25.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/37.3 requests/2.28.2 requests-toolbelt/0.10.1 urllib3/1.26.14 tqdm/4.64.1 importlib-metadata/6.0.0 keyring/23.13.1 rfc3986/2.0.0 colorama/0.4.6 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0948e95e0965d445c87276a81efc83f183bd2063fe8fdfa7769054d1d66d42b3 |
|
MD5 | 94283bd982f7f037a83accee01e20264 |
|
BLAKE2b-256 | 5eb16f394a338e8798c4f976fb2ecbfc01a75052b87e0423c13c15f09413f712 |