DataFrames on AWS.
Project description
DataFrames on AWS
Read the Docs
Resources
Use Cases
PySpark
| FROM | TO | Features |
|---|---|---|
| PySpark DataFrame | Amazon Redshift | Blazing fast using parallel parquet on S3 behind the scenesAppend/Overwrite/Upsert modes |
| PySpark DataFrame | Glue Catalog | Register Parquet or CSV DataFrame on Glue Catalog |
| Nested PySpark DataFrame |
Flat PySpark DataFrames |
Flatten structs and break up arrays in child tables |
Pandas
| FROM | TO | Features |
|---|---|---|
| Pandas DataFrame | Amazon S3 | Parquet, CSV, Partitions, Parallelism, Overwrite/Append/Partitions-Upsert modes, KMS Encryption, Glue Metadata (Athena, Spectrum, Spark, Hive, Presto) |
| Amazon S3 | Pandas DataFrame | Parquet (Pushdown filters), CSV, Fixed-width formatted, Partitions, Parallelism, KMS Encryption, Multiple files |
| Amazon Athena | Pandas DataFrame | Workgroups, S3 output path, Encryption, and two different engines: - ctas_approach=False -> Batching and restrict memory environments - ctas_approach=True -> Blazing fast, parallelism and enhanced data types |
| Pandas DataFrame | Amazon Redshift | Blazing fast using parallel parquet on S3 behind the scenes Append/Overwrite/Upsert modes |
| Amazon Redshift | Pandas DataFrame | Blazing fast using parallel parquet on S3 behind the scenes |
| Pandas DataFrame | Amazon Aurora | Supported engines: MySQL, PostgreSQL Blazing fast using parallel CSV on S3 behind the scenes Append/Overwrite modes |
| Amazon Aurora | Pandas DataFrame | Supported engines: MySQL Blazing fast using parallel CSV on S3 behind the scenes |
| CloudWatch Logs Insights | Pandas DataFrame | Query results |
| Glue Catalog | Pandas DataFrame | List and get Tables details. Good fit with Jupyter Notebooks. |
General
| Feature | Details |
|---|---|
| List S3 objects | e.g. wr.s3.list_objects("s3://...") |
| Delete S3 objects | Parallel |
| Delete listed S3 objects | Parallel |
| Delete NOT listed S3 objects | Parallel |
| Copy listed S3 objects | Parallel |
| Get the size of S3 objects | Parallel |
| Get CloudWatch Logs Insights query results | |
| Load partitions on Athena/Glue table | Through "MSCK REPAIR TABLE" |
| Create EMR cluster | "For humans" |
| Terminate EMR cluster | "For humans" |
| Get EMR cluster state | "For humans" |
| Submit EMR step(s) | "For humans" |
| Get EMR step state | "For humans" |
| Query Athena to receive python primitives | Returns Iterable[Dict[str, Any] |
| Load and Unzip SageMaker jobs outputs | |
| Dump Amazon Redshift as Parquet files on S3 | |
| Dump Amazon Aurora as CSV files on S3 | Only for MySQL engine |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
awswrangler-0.3.1.tar.gz
(62.2 kB
view details)
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
awswrangler-0.3.1-py3.7.egg
(154.2 kB
view details)
File details
Details for the file awswrangler-0.3.1.tar.gz.
File metadata
- Download URL: awswrangler-0.3.1.tar.gz
- Upload date:
- Size: 62.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd37fb3a1e705bd2907ef020dbe7ed03e99c89145c649813148692240375f523
|
|
| MD5 |
40635e3f3f4e2444f4bb361ccb954d5c
|
|
| BLAKE2b-256 |
2a4d1190903ca6ad6801f383b32715b02eac5da77da6648ce92f850caf6f5df6
|
File details
Details for the file awswrangler-0.3.1-py3.7.egg.
File metadata
- Download URL: awswrangler-0.3.1-py3.7.egg
- Upload date:
- Size: 154.2 kB
- Tags: Egg
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8df7393782daf117afd93ef7eaf4df6b5f999592ecbffa3b5e07c90d16a49417
|
|
| MD5 |
364931d9abed1162ef72f004b617e93f
|
|
| BLAKE2b-256 |
e80ba1fddee3844f90b12a1a70bd2d244957bc8345c2e1e9ebeb3143c2f1bba4
|
File details
Details for the file awswrangler-0.3.1-glue-none-any.whl.
File metadata
- Download URL: awswrangler-0.3.1-glue-none-any.whl
- Upload date:
- Size: 69.4 kB
- Tags: glue
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
663b03e3eeff5abb25212f24c5a10cbc3e2b2d7354407d44781b51e3d8c9f2c1
|
|
| MD5 |
0beecc88b73fd39cf2f6739e05530778
|
|
| BLAKE2b-256 |
6281550072d74ed2c98287e0fbb8df5901a0d5c71e6f8e99d6b1f97089482da6
|