Glue DynamicFrame syntax closer to DataFrames
Project description
EasyGlue
This project aims to make the usage of AWS Glue's DynamicFrame more similar to that of Apache Spark's DataFrame, so that it's easier to remember and read. Let's take a simple S3 read of a JSON dataset for instance:
In Spark, this would be a DataFrame S3 read:
spark.read().json('s3://test_path/')
In Glue, this would be a DynamicFrame S3 read:
glue.create_dynamic_frame.from_options(connection_type='s3', connection_options={'paths': ['s3://test_path/']}, format='json', transformation_ctx='datasource0')
As you can see, the syntax here is quite different - and in the case of Glue, way more verbose. With EasyGlue, you can turn the DynamicFrame read operation into something way more similar:
glue.read().json('s3://test_path/')
Currently-supported options
The project currently supports:
- Reading from S3 in any of the supported formats
- Read from Data Catalog tables
- Read from JDBC sources
- Read from RDD
- Read from DDB
Usage
You can either use the pre-built PyPi package, or build it yourself to a wheel file:
Using from PyPi
To use EasyGlue in your projects, simply add the following properties to your job:
key: --additional-python-modules
value: easyglue
Then add an import easyglue
statement at the beginning of your job's code. That's it.
Building to a wheel file
If you prefer to build from source and pass the module as a wheel file, do the following:
- Download the source code:
git clone https://github.com/albertquiroga/EasyGlue.git
- Go into the project's directory, and build it into a wheel file:
python setup.py build bdist_wheel
- A new
dist
directory will have been created, inside you'll find the built wheel file. Upload this to S3 and add it as a library to your Glue ETL Job - In your ETL Job code, simply add a
import easyglue
line at the top
How does this work?
This project uses class extension methods to simply add methods to the GlueContext class:
- In Python this is not supported as neatly as in other programming languages, but it's doable through
setattr
. - In Scala, this is directly supported.
Roadmap
- Writes
- Automatic transformation_ctx handling
- Scala support
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.