Skip to main content

Build Python interfaces to the AWS Glue ETL library for use as a local dependency.

Project description

awsglue

The awsglue Python package contains the Python portion of the AWS Glue library. This library extends PySpark to support serverless ETL on AWS.

Note that this package must be used in conjunction with the AWS Glue service and is not executable independently. Many of the classes and methods use the Py4J library to interface with code that is available on the Glue platform. This repository can be used as a reference and aid for writing Glue scripts.

While scripts using this library can only be run on the AWS Glue service, it is possible to import this library locally. This may be helpful to provide auto-completion in an IDE, for instance. To import the library successfully you will need to install PySpark, which can be done using pip:

  pip install pyspark

Content

This package contains Python interfaces to the key data structures and methods used in AWS Glue. The following are some important modules. More information can be found in the public documentation.

GlueContext

The file context.py contains the GlueContext class. GlueContext extends PySpark's SQLContext class to provide Glue-specific operations. Most Glue programs will start by instantiating a GlueContext and using it to construct a DynamicFrame.

DynamicFrame

The DynamicFrame, defined in dynamicframe.py, is the core data structure used in Glue scripts. DynamicFrames are similar to Spark SQL's DataFrames in that they represent distributed collections of data records, but DynamicFrames provide more flexible handling of data sets with inconsistent schemas. By representing records in a self-describing way, they can be used without specifying a schema up front or requiring a costly schema inference step.

DynamicFrames support many operations, but it is also possible to convert them to DataFrames using the toDF method to make use of existing Spark SQL operations.

Transforms

The transforms directory contains a variety of operations that can be performed on DynamicFrames. These include simple operations, such as DropFields, as well as more complex transformations like Relationalize, which flattens a nested data set into a collection of tables that can be loaded into a Relational Database. Once imported, transforms can be invoked using the following syntax:

    TransformClass.apply(args...)

Additional Resources

  • The aws-glue-samples repository contains sample scripts that make use of awsglue library and can be submitted directly to the AWS Glue service.

  • The public Glue Documentation contains information about the AWS Glue service as well as addditional information about the Python library.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

awsglue-local-0.9.1.tar.gz (29.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

awsglue_local-0.9.1-py2.py3-none-any.whl (48.1 kB view details)

Uploaded Python 2Python 3

File details

Details for the file awsglue-local-0.9.1.tar.gz.

File metadata

  • Download URL: awsglue-local-0.9.1.tar.gz
  • Upload date:
  • Size: 29.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.9 CPython/2.7.18 Darwin/19.5.0

File hashes

Hashes for awsglue-local-0.9.1.tar.gz
Algorithm Hash digest
SHA256 09d6e1db8adecec1591fcf03cea10a573808fd03b322eeabe92d934e4ed9f0be
MD5 e183a3e8715449cafe1da2a968647d38
BLAKE2b-256 64789d5bcd3203ebaa10f7b734955da51583225e009a91f3183bc82ba6736430

See more details on using hashes here.

File details

Details for the file awsglue_local-0.9.1-py2.py3-none-any.whl.

File metadata

  • Download URL: awsglue_local-0.9.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 48.1 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.9 CPython/2.7.18 Darwin/19.5.0

File hashes

Hashes for awsglue_local-0.9.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 0b48b8cf2fd94884d1b987e61df884e9d0cc9d81dc605b6fe32f0ec4ea87af25
MD5 fd2a81bea2cc20a0c6a607d37828fe9e
BLAKE2b-256 018ff9e5c801e2bc257cc8f856f90e1a1a923cbaa30b884e58452856f0cab957

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page