No project description provided
Project description
Pandas AWS - AWS made easy for data scientists
Pandas AWS makes it super easy to use a pandas.DataFrame along with AWS services.
Working with S3
First create an S3 client to be used later and define a bucket
from pandas_aws import get_client
s3 = get_client('s3')
MY_BUCKET= 'pandas-aws-bucket'
Example 1: get a DataFrame from a parquet file stored in S3
from pandas_aws.s3 import get_df
df_from_parquet_file = get_df(s3, MY_BUCKET, 'my_parquet_file_path', format='parquet')
Example 2: get a DataFrame from multiple CSV files (with same schema) stored in S3
from pandas_aws.s3 import get_df_from_keys
df_from_list = get_df_from_keys(s3, MY_BUCKET, prefix='my-folder', suffix='.csv')
Example 3: put a DataFrame into S3 using an xlsx (Excel) file format
from pandas_aws.s3 import put_df
put_df(s3, my_dataframe, MY_BUCKET, 'target_file_path', format='xlsx')
Example 4: put a DataFrame into S3 using multi parts upload
from pandas_aws.s3 import put_df
put_df(s3, my_dataframe, MY_BUCKET, 'target_file_path', format='csv', compression='gzip', parts=8)
Installing pandas-aws
Pip installation
You can use pip to download the package
pip install pandas-aws
Contributing to pandas-aws
Git clone
We use the develop brand as the release branch, thus git clone the repository and git checkout develop in order to get the latest version in development.
git clone git@github.com:FlorentPajot/pandas-aws.git
Preparing your environment
Pandas AWS uses poetry to manage dependencies. Thus, poetry is required:
curl -SSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python
Create a separate Python environment, for example using pyenv along with pyenv-virtualenv and Python 3.7.7:
pyenv install 3.7.7
pyenv virtualenv 3.7.7 pandas-aws
pyenv activate pandas-aws
Check your environment using:
which python
// should show something like .pyenv/shims/python
python -V
// should show python 3.7.7 (or any other version you selected)
pip list
// should show barely nothing except pip and setuptools
In cas your encounter a problem, check Pyenv documentation.
Then install dependencies with poetry after your git clone from the project repository:
poetry install
Guidelines
Todo
Requires
The project needs the following dependencies:
- libpq-dev (psycopg2 dependency)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pandas-aws-0.1.6.tar.gz.
File metadata
- Download URL: pandas-aws-0.1.6.tar.gz
- Upload date:
- Size: 6.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.5 CPython/3.7.1 Linux/4.15.0-1077-gcp
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c646c4207ace980629d2eaace014f18e426b220a9f7951db751c745c47f8a2cf
|
|
| MD5 |
4213a1ea72c310acafaebf27b574ae04
|
|
| BLAKE2b-256 |
bb99352369f0265066eeb3d222f312beb555df5b6676fa0a30d93e6edceeabe6
|
File details
Details for the file pandas_aws-0.1.6-py3-none-any.whl.
File metadata
- Download URL: pandas_aws-0.1.6-py3-none-any.whl
- Upload date:
- Size: 8.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.5 CPython/3.7.1 Linux/4.15.0-1077-gcp
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
737d4656b1dcb23c6794dabc38d47faa5b291943c6c5a62e76d10c3ebdc01773
|
|
| MD5 |
913e2baa4756024e03a0767575be34cd
|
|
| BLAKE2b-256 |
6d77352d366564b8ea3b4682bd589c8cd790bc74ab970fa3ba48f54eb3b85678
|