Python SDK for Trifacta
Project description
Python SDK for Trifacta
Lets user integrate their python centric environment with Trifacta.
Getting Started
Installation
- Install
trifacta
using pip.pip install trifacta
How to use
Configuration and Prerequisites
Enable access to your trifacta workspace
- Click on
Generate new token
to create a new token. Copy the token by clicking onCopy token to clipboard
before closing modal. - Keep this token somewhere safe and accessible as this would be required in steps below.
Configure trifacta package
Python SDK for Trifacta
requires small configuration before it could be used to interact with a Trifacta environment.
- Create a new configuration file in your home directory name it
.trifacta.py.conf
. - Open the file in editor and add following configuration to it
[CONFIGURATION] username = <username_for_trifacta_account> # example: test-user@gmail.com endpoint = <uri_for_your_trifacta_worskapce> # example: https://test-workspace.saas-latest-dev.trifacta.net token = <copied_token_from_steps_above>
- Save the file.
Upload and flow generation
- Create new python3 notebook and import the
trifacta
module.
Now, you have a handler to interact with your Trifacta workspace.import trifacta as tf
- Next, try to wrangle/transform a CSV dataset using Trifacta.
import pandas as pd df = pd.read_csv(<path_to_csv_dataset>) wf = tf.wrangle(df)
wrangle
function lets you upload a dataset to Trifacta and create a flow for it, which then can be used to wrangle/transform the dataset from Trifacta's user-interface. It also returns a handle for the created flow with which you can perform other operations on your dataset.
Trifacta in browser launch
- Once the upload completes, execute below statement to open Trifacta in a browser window.
wf.open()
- In the Trifacta window, navigate to the flow created for you. Create a recipe to prepare your dataset, by applying certain transformation on Transformer UI of Trifacta. Once done with data preparation, go back to the notebook window.
Pandas code generation
- To use
get_pandas()
functionality,Wrangle to Python Conversion
setting must be enabled by the Administrator of your Trifacta workspace, through Workspace Admin Settings page. - Get pandas code for the transform recipe created in Trifacta, such that you can use it transform
your
Pandas DataFrame
.column_names = df.columns.to_list() wf.get_pandas(column_names, add_to_next_cell=True)
get_pandas
will translate Trifacta's transform recipe into pandas code andadd_to_next_cell
set toTrue
will make sure that the generated code is added to next cell of notebook. - Execute the generated code in next cell, then in a new cell perform following actions to transform the dataframe using
above generated Pandas code.
This will return the output of cleansed/transformed pandas dataframe.wrangled_df = run_transforms(df) wrangled_df
Data Profiling
The SDK offers data profiling features for Trifacta's flow
.
summary()
- gives a table of summary statistics per columndqBars()
- provides the valid/invalid/missing ratio per columncolTypes()
- simply lists the induced data type for each columnbarsDfList()
- gives a list of dataframes, one per column, representing a bar-chart for that columnpdfProfile()
- produces a snazzy pdf report with all the statistics
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
trifacta-8.3.0-py3-none-any.whl
(36.3 kB
view details)
File details
Details for the file trifacta-8.3.0-py3-none-any.whl
.
File metadata
- Download URL: trifacta-8.3.0-py3-none-any.whl
- Upload date:
- Size: 36.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.22.0 setuptools/54.2.0 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c544ab7efb8380db7af420fae0c67a23fc93823a8495210d8ba074b65a93fe43 |
|
MD5 | 612538c83ab80eb0e2f6426c99edfd31 |
|
BLAKE2b-256 | 64a2975d552d544be5e57e2d0daa3c57a96c40b97a1fdffcf823b9e36ef9c81f |