DataFrames on AWS.
Project description
DataFrames on AWS
Resources
Use Cases
PySpark
| FROM | TO | Features |
|---|---|---|
| PySpark DataFrame | Amazon Redshift | Blazing fast using parallel parquet on S3 behind the scenesAppend/Overwrite/Upsert modes |
| PySpark DataFrame | Glue Catalog | Register Parquet or CSV DataFrame on Glue Catalog |
| Nested PySpark DataFrame |
Flat PySpark DataFrames |
Flatten structs and break up arrays in child tables |
Pandas
| FROM | TO | Features |
|---|---|---|
| Pandas DataFrame | Amazon S3 | Parquet, CSV, Partitions, Parallelism, Overwrite/Append/Partitions-Upsert modes, KMS Encryption, Glue Metadata (Athena, Spectrum, Spark, Hive, Presto) |
| Amazon S3 | Pandas DataFrame | Parquet (Pushdown filters), CSV, Fixed-width formatted, Partitions, Parallelism, KMS Encryption, Multiple files |
| Amazon Athena | Pandas DataFrame | Workgroups, S3 output path, Encryption, and two different engines: - ctas_approach=False -> Batching and restrict memory environments - ctas_approach=True -> Blazing fast, parallelism and enhanced data types |
| Pandas DataFrame | Amazon Redshift | Blazing fast using parallel parquet on S3 behind the scenes Append/Overwrite/Upsert modes |
| Amazon Redshift | Pandas DataFrame | Blazing fast using parallel parquet on S3 behind the scenes |
| Pandas DataFrame | Amazon Aurora | Supported engines: MySQL, PostgreSQL Blazing fast using parallel CSV on S3 behind the scenes Append/Overwrite modes |
| Amazon Aurora | Pandas DataFrame | Supported engines: MySQL Blazing fast using parallel CSV on S3 behind the scenes |
| CloudWatch Logs Insights | Pandas DataFrame | Query results |
| Glue Catalog | Pandas DataFrame | List and get Tables details. Good fit with Jupyter Notebooks. |
General
| Feature | Details |
|---|---|
| List S3 objects | e.g. wr.s3.list_objects("s3://...") |
| Delete S3 objects | Parallel |
| Delete listed S3 objects | Parallel |
| Delete NOT listed S3 objects | Parallel |
| Copy listed S3 objects | Parallel |
| Get the size of S3 objects | Parallel |
| Get CloudWatch Logs Insights query results | |
| Load partitions on Athena/Glue table | Through "MSCK REPAIR TABLE" |
| Create EMR cluster | "For humans" |
| Terminate EMR cluster | "For humans" |
| Get EMR cluster state | "For humans" |
| Submit EMR step(s) | "For humans" |
| Get EMR step state | "For humans" |
| Query Athena to receive python primitives | Returns Iterable[Dict[str, Any] |
| Load and Unzip SageMaker jobs outputs | |
| Dump Amazon Redshift as Parquet files on S3 | |
| Dump Amazon Aurora as CSV files on S3 | Only for MySQL engine |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
awswrangler-0.3.2.tar.gz
(61.7 kB
view details)
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
awswrangler-0.3.2-py3.6.egg
(155.5 kB
view details)
File details
Details for the file awswrangler-0.3.2.tar.gz.
File metadata
- Download URL: awswrangler-0.3.2.tar.gz
- Upload date:
- Size: 61.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a35506dd39225343ad610f3d665297f1e1778124a3e469c9422ca006ea31663
|
|
| MD5 |
de9f5d5dddd901fc5070f1cf7a7c6ca0
|
|
| BLAKE2b-256 |
e999b3ba9811e1a5f346da484f2dff40924613ec481df5d463e30bc3fd71096e
|
File details
Details for the file awswrangler-0.3.2-py3.6.egg.
File metadata
- Download URL: awswrangler-0.3.2-py3.6.egg
- Upload date:
- Size: 155.5 kB
- Tags: Egg
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
486b8963911e6b53ae3e5000665acb07a61ee526a98316bedcc2d33e1ff2fecc
|
|
| MD5 |
775581ed84b18880a49bad471f24fd5e
|
|
| BLAKE2b-256 |
22621fff7eb2420daa86c8485c3bbefc7e8933d76d9712f2ebfd28573022e7ec
|
File details
Details for the file awswrangler-0.3.2-glue-none-any.whl.
File metadata
- Download URL: awswrangler-0.3.2-glue-none-any.whl
- Upload date:
- Size: 70.1 kB
- Tags: glue
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
928ad7cf8848aa423d6b4c5408fcbfbbe0242a88ecfd1d4383324039172c6348
|
|
| MD5 |
4a40fe7b3e8b0732894f04be1754e39d
|
|
| BLAKE2b-256 |
499128e1a01a37dfdc865d50008771454bc7bdad168531b5418095f6c67e74b0
|