Skip to main content

DataFrames on AWS.

Project description

AWS Data Wrangler

DataFrames on AWS

Release Python Version Documentation Status Coverage Average time to resolve an issue License

PyPI: PyPI Downloads

Conda: Conda Downloads

Resources

Use Cases

PySpark

FROM TO Features
PySpark DataFrame Amazon Redshift Blazing fast using parallel parquet on S3 behind the scenesAppend/Overwrite/Upsert modes
PySpark DataFrame Glue Catalog Register Parquet or CSV DataFrame on Glue Catalog
Nested PySpark
DataFrame
Flat PySpark
DataFrames
Flatten structs and break up arrays in child tables

Pandas

FROM TO Features
Pandas DataFrame Amazon S3 Parquet, CSV, Partitions, Parallelism, Overwrite/Append/Partitions-Upsert modes,
KMS Encryption, Glue Metadata (Athena, Spectrum, Spark, Hive, Presto)
Amazon S3 Pandas DataFrame Parquet (Pushdown filters), CSV, Fixed-width formatted, Partitions, Parallelism,
KMS Encryption, Multiple files
Amazon Athena Pandas DataFrame Workgroups, S3 output path, Encryption, and two different engines:

- ctas_approach=False -> Batching and restrict memory environments
- ctas_approach=True -> Blazing fast, parallelism and enhanced data types
Pandas DataFrame Amazon Redshift Blazing fast using parallel parquet on S3 behind the scenes
Append/Overwrite/Upsert modes
Amazon Redshift Pandas DataFrame Blazing fast using parallel parquet on S3 behind the scenes
Pandas DataFrame Amazon Aurora Supported engines: MySQL, PostgreSQL
Blazing fast using parallel CSV on S3 behind the scenes
Append/Overwrite modes
Amazon Aurora Pandas DataFrame Supported engines: MySQL
Blazing fast using parallel CSV on S3 behind the scenes
CloudWatch Logs Insights Pandas DataFrame Query results
Glue Catalog Pandas DataFrame List and get Tables details. Good fit with Jupyter Notebooks.

General

Feature Details
List S3 objects e.g. wr.s3.list_objects("s3://...")
Delete S3 objects Parallel
Delete listed S3 objects Parallel
Delete NOT listed S3 objects Parallel
Copy listed S3 objects Parallel
Get the size of S3 objects Parallel
Get CloudWatch Logs Insights query results
Load partitions on Athena/Glue table Through "MSCK REPAIR TABLE"
Create EMR cluster "For humans"
Terminate EMR cluster "For humans"
Get EMR cluster state "For humans"
Submit EMR step(s) "For humans"
Get EMR step state "For humans"
Query Athena to receive python primitives Returns Iterable[Dict[str, Any]
Load and Unzip SageMaker jobs outputs
Dump Amazon Redshift as Parquet files on S3
Dump Amazon Aurora as CSV files on S3 Only for MySQL engine

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

awswrangler-0.3.2.tar.gz (61.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

awswrangler-0.3.2-py3.6.egg (155.5 kB view details)

Uploaded Egg

awswrangler-0.3.2-glue-none-any.whl (70.1 kB view details)

Uploaded glue

File details

Details for the file awswrangler-0.3.2.tar.gz.

File metadata

  • Download URL: awswrangler-0.3.2.tar.gz
  • Upload date:
  • Size: 61.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.9

File hashes

Hashes for awswrangler-0.3.2.tar.gz
Algorithm Hash digest
SHA256 8a35506dd39225343ad610f3d665297f1e1778124a3e469c9422ca006ea31663
MD5 de9f5d5dddd901fc5070f1cf7a7c6ca0
BLAKE2b-256 e999b3ba9811e1a5f346da484f2dff40924613ec481df5d463e30bc3fd71096e

See more details on using hashes here.

File details

Details for the file awswrangler-0.3.2-py3.6.egg.

File metadata

  • Download URL: awswrangler-0.3.2-py3.6.egg
  • Upload date:
  • Size: 155.5 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.9

File hashes

Hashes for awswrangler-0.3.2-py3.6.egg
Algorithm Hash digest
SHA256 486b8963911e6b53ae3e5000665acb07a61ee526a98316bedcc2d33e1ff2fecc
MD5 775581ed84b18880a49bad471f24fd5e
BLAKE2b-256 22621fff7eb2420daa86c8485c3bbefc7e8933d76d9712f2ebfd28573022e7ec

See more details on using hashes here.

File details

Details for the file awswrangler-0.3.2-glue-none-any.whl.

File metadata

  • Download URL: awswrangler-0.3.2-glue-none-any.whl
  • Upload date:
  • Size: 70.1 kB
  • Tags: glue
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.9

File hashes

Hashes for awswrangler-0.3.2-glue-none-any.whl
Algorithm Hash digest
SHA256 928ad7cf8848aa423d6b4c5408fcbfbbe0242a88ecfd1d4383324039172c6348
MD5 4a40fe7b3e8b0732894f04be1754e39d
BLAKE2b-256 499128e1a01a37dfdc865d50008771454bc7bdad168531b5418095f6c67e74b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page