Skip to main content

Python SDK for Trifacta

Project description

Python SDK for Trifacta

Lets user integrate their python centric environment with Trifacta.

Getting Started

Installation

  • Install trifacta using pip.
    pip install trifacta
    

How to use

Configuration and Prerequisites

Enable access to your trifacta workspace

  • Click on Generate new token to create a new token. Copy the token by clicking on Copy token to clipboard before closing modal.
  • Keep this token somewhere safe and accessible as this would be required in steps below.

Configure trifacta package

Python SDK for Trifacta requires small configuration before it could be used to interact with a Trifacta environment.

  • Create a new configuration file in your home directory name it .trifacta.py.conf.
  • Open the file in editor and add following configuration to it
    [CONFIGURATION]
    username = <username_for_trifacta_account>  # example: test-user@gmail.com
    endpoint = <uri_for_your_trifacta_worskapce>  # example: https://test-workspace.saas-latest-dev.trifacta.net
    token = <copied_token_from_steps_above>
    
  • Save the file.

Upload and flow generation

  • Create new python3 notebook and import the trifacta module.
    import trifacta as tf
    
    Now, you have a handler to interact with your Trifacta workspace.
  • Next, try to wrangle/transform a CSV dataset using Trifacta.
    import pandas as pd
    df = pd.read_csv(<path_to_csv_dataset>)
    wf = tf.wrangle(df)
    
    wrangle function lets you upload a dataset to Trifacta and create a flow for it, which then can be used to wrangle/transform the dataset from Trifacta's user-interface. It also returns a handle for the created flow with which you can perform other operations on your dataset.

Trifacta in browser launch

  • Once the upload completes, execute below statement to open Trifacta in a browser window.
    wf.open()
    
  • In the Trifacta window, navigate to the flow created for you. Create a recipe to prepare your dataset, by applying certain transformation on Transformer UI of Trifacta. Once done with data preparation, go back to the notebook window.

Pandas code generation

  • To use get_pandas() functionality, Wrangle to Python Conversion setting must be enabled by the Administrator of your Trifacta workspace, through Workspace Admin Settings page.
  • Get pandas code for the transform recipe created in Trifacta, such that you can use it transform your Pandas DataFrame.
    column_names = df.columns.to_list()
    wf.get_pandas(column_names, add_to_next_cell=True)
    
    get_pandas will translate Trifacta's transform recipe into pandas code and add_to_next_cell set to True will make sure that the generated code is added to next cell of notebook.
  • Execute the generated code in next cell, then in a new cell perform following actions to transform the dataframe using above generated Pandas code.
    wrangled_df = run_transforms(df)
    wrangled_df
    
    This will return the output of cleansed/transformed pandas dataframe.

Data Profiling

The SDK offers data profiling features for Trifacta's flow.

  • summary() - gives a table of summary statistics per column
  • dqBars() - provides the valid/invalid/missing ratio per column
  • colTypes() - simply lists the induced data type for each column
  • barsDfList() - gives a list of dataframes, one per column, representing a bar-chart for that column
  • pdfProfile() - produces a snazzy pdf report with all the statistics

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

trifacta-3.0.0-py3-none-any.whl (31.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page