ETL tooling for tableau seed data
Project description
TableauTransformer
ETL tooling for preparing tableau seed data
Description
This library was built with the intentions of enhancing the experience of data wrangling for tableau. Tableau can be very particular about the data it reads from. In addition preparing data to fit the shape for different graphs can be time consuming. TableauTransformer can be used to hurdle over these two barriers.
Dependencies
- pip, python 3.6, pandas, numpy
Getting Started
pip install tableautransformer
import tableautransformer as tbt
tbt is a collection of functions, not a collection of methods, so all calls are "tbt.function_name()"
Function Docs
Here you can find a list of all functions within the library, a description of what they do, and their inputs.
Basic_Table
basic_table(read_path, read_type='csv', sheet_name=None, columns_to_keep=None, columns_rename=None,
filters=None, group_by=None, aggregate_columns=None, pre_agg_math_columns=None,
post_agg_math_columns=None, remove_NAN=True, remove_NAN_col='all')
Description
basic_table is the basis for the tbt library as it refactors ~20 lines of commonly repeated code down to one input heavy function. The function reads in a dataframe, cleans up the data, and performs commonly used table operations.
Inputs
read_path: string
The path to the file you wish to read. The only mandatory input.
read_type: 'csv' or 'excel'
Default is csv, if type is excel then sheet_name must have a value.
sheet_name: string
The name of the tab you wish to read in.
columns_to_keep: list of strings
['colA','colB','colC'] This function runs immediately after reading in the data, any column mentioned in the list will remain in the dataframe, all others are dropped.
columns_rename: list of strings
['colA','colB','colC'] the renaming process occurs after the file is read in and columns_to_keep have been selected. All other column related inputs should use the new name dictated by the rename process.
filters: list of 3-element tuples
[('col_name','operand','value')] the input can be multiple filters, each filter is a 3-element tuple where the first element is the column name, the second is the operand, and the third is the value. The column name and operand must be strings while the value can be numeric (or a string if the operand is '==').
group_by
aggregate_columns
pre_agg_math_columns
post_agg_math_columns
remove_NAN
remove_NAN_col
Example
Bucket
bucket(df, column, bucket_col_name, intervals)
Description
Inputs
Example
Is_In
is_in(df, target_col, isin_list)
Description
Inputs
Example
Cast
cast(df, target_col, value)
Description
Inputs
Example
Date_Format
date_format(df, target_col, date_format)
Description
Inputs
Example
Authors
Contributors names and contact info
- Josh Teixeira | jteixeira@cppib.com
Version History
- 0.0.17
- README documentation added
- 0.0.16
- bucket function added
- 0.0.1
- Initial beta release
License
This project is licensed under the MIT License - see the LICENSE.md file for details
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for tableautransformer-0.0.17.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 663c44fecfce6a605d57c8c2cff66aa566b0dab3066a5de03e02aea57657b8f2 |
|
MD5 | 5fef5690c8fe7f6e22ccd656c5f92e3c |
|
BLAKE2b-256 | 32213054193c2787e2285b7b4e3cc965f48fb3bfbab06e7d8486b1618ca21fb8 |
Hashes for tableautransformer-0.0.17-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b73d6c367bcbe0910d36ce5903df63903a28a697f70acf4fbf0d60cf59eb7f2 |
|
MD5 | 7125cce46291ee79af61c9d3d3020f84 |
|
BLAKE2b-256 | 6d5149c009f7fe863b0a4dd85cb25be4abf28a4ba64091fd684d0cd2e5b295f0 |