Skip to main content

DataFrames on AWS.

Project description

AWS Data Wrangler

DataFrames on AWS

Release Downloads Python Version Documentation Status Coverage Average time to resolve an issue License

Read the Docs

Resources

Use Cases

PySpark

FROM TO Features
PySpark DataFrame Amazon Redshift Blazing fast using parallel parquet on S3 behind the scenesAppend/Overwrite/Upsert modes
PySpark DataFrame Glue Catalog Register Parquet or CSV DataFrame on Glue Catalog
Nested PySpark
DataFrame
Flat PySpark
DataFrames
Flatten structs and break up arrays in child tables

Pandas

FROM TO Features
Pandas DataFrame Amazon S3 Parquet, CSV, Partitions, Parallelism, Overwrite/Append/Partitions-Upsert modes,
KMS Encryption, Glue Metadata (Athena, Spectrum, Spark, Hive, Presto)
Amazon S3 Pandas DataFrame Parquet (Pushdown filters), CSV, Fixed-width formatted, Partitions, Parallelism,
KMS Encryption, Multiple files
Amazon Athena Pandas DataFrame Workgroups, S3 output path, Encryption, and two different engines:

- ctas_approach=False -> Batching and restrict memory environments
- ctas_approach=True -> Blazing fast, parallelism and enhanced data types
Pandas DataFrame Amazon Redshift Blazing fast using parallel parquet on S3 behind the scenes
Append/Overwrite/Upsert modes
Amazon Redshift Pandas DataFrame Blazing fast using parallel parquet on S3 behind the scenes
Pandas DataFrame Amazon Aurora Supported engines: MySQL, PostgreSQL
Blazing fast using parallel CSV on S3 behind the scenes
Append/Overwrite modes
Amazon Aurora Pandas DataFrame Supported engines: MySQL
Blazing fast using parallel CSV on S3 behind the scenes
CloudWatch Logs Insights Pandas DataFrame Query results
Glue Catalog Pandas DataFrame List and get Tables details. Good fit with Jupyter Notebooks.

General

Feature Details
List S3 objects e.g. wr.s3.list_objects("s3://...")
Delete S3 objects Parallel
Delete listed S3 objects Parallel
Delete NOT listed S3 objects Parallel
Copy listed S3 objects Parallel
Get the size of S3 objects Parallel
Get CloudWatch Logs Insights query results
Load partitions on Athena/Glue table Through "MSCK REPAIR TABLE"
Create EMR cluster "For humans"
Terminate EMR cluster "For humans"
Get EMR cluster state "For humans"
Submit EMR step(s) "For humans"
Get EMR step state "For humans"
Query Athena to receive python primitives Returns Iterable[Dict[str, Any]
Load and Unzip SageMaker jobs outputs
Dump Amazon Redshift as Parquet files on S3
Dump Amazon Aurora as CSV files on S3 Only for MySQL engine

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

awswrangler-0.3.1.tar.gz (62.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

awswrangler-0.3.1-py3.7.egg (154.2 kB view details)

Uploaded Egg

awswrangler-0.3.1-glue-none-any.whl (69.4 kB view details)

Uploaded glue

File details

Details for the file awswrangler-0.3.1.tar.gz.

File metadata

  • Download URL: awswrangler-0.3.1.tar.gz
  • Upload date:
  • Size: 62.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.5

File hashes

Hashes for awswrangler-0.3.1.tar.gz
Algorithm Hash digest
SHA256 bd37fb3a1e705bd2907ef020dbe7ed03e99c89145c649813148692240375f523
MD5 40635e3f3f4e2444f4bb361ccb954d5c
BLAKE2b-256 2a4d1190903ca6ad6801f383b32715b02eac5da77da6648ce92f850caf6f5df6

See more details on using hashes here.

File details

Details for the file awswrangler-0.3.1-py3.7.egg.

File metadata

  • Download URL: awswrangler-0.3.1-py3.7.egg
  • Upload date:
  • Size: 154.2 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.5

File hashes

Hashes for awswrangler-0.3.1-py3.7.egg
Algorithm Hash digest
SHA256 8df7393782daf117afd93ef7eaf4df6b5f999592ecbffa3b5e07c90d16a49417
MD5 364931d9abed1162ef72f004b617e93f
BLAKE2b-256 e80ba1fddee3844f90b12a1a70bd2d244957bc8345c2e1e9ebeb3143c2f1bba4

See more details on using hashes here.

File details

Details for the file awswrangler-0.3.1-glue-none-any.whl.

File metadata

  • Download URL: awswrangler-0.3.1-glue-none-any.whl
  • Upload date:
  • Size: 69.4 kB
  • Tags: glue
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.5

File hashes

Hashes for awswrangler-0.3.1-glue-none-any.whl
Algorithm Hash digest
SHA256 663b03e3eeff5abb25212f24c5a10cbc3e2b2d7354407d44781b51e3d8c9f2c1
MD5 0beecc88b73fd39cf2f6739e05530778
BLAKE2b-256 6281550072d74ed2c98287e0fbb8df5901a0d5c71e6f8e99d6b1f97089482da6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page