Skip to main content

Trifacta client

Project description

trifacta

Trifacta client that makes it easy to integrate Trifacta into your production and data science workflows

Usage Scenarios

  • Jupyter: Invoke Trifacta jobs from a Jupyter notebook and pass data back and forth between Jupyter and Trifacta
  • Other Notebooks: Integrate Trifacta with Azure Databricks, Zepellin or any other notebook-style interface that supports Python
  • Scripts: Automate Trifacta jobs and input/output using python scripts that can be easily executed from the command line or called from an external scheduler

Functionality

This library makes it simple to do the following:

  1. Connect to a Trifacta instance
  2. Run a job
  3. Download results to a pandas dataframe OR Download results as text/csv
  4. Upload files to Trifacta

Note that file uploads and downloads are performed using httpfs, and require that port 14000 be opened on the Trifacta server

#!pip install trifacta
import trifacta
#Step 1: Connect to Trifacta by providing the URL, username and password
t = trifacta.Client('https://partnerdemo.trifacta.net', 'userid@mydomain.com', 'mypassword')

Get the wrangled dataset id from the URL in the Trifacta UI

Make sure that you have run the job manually at least once Screenshot_recipe

Note the output path (be sure to set it to "replace")

Run Job

#Step 2: Run the job
t.run_job(14478)
About to run job
{'jobgroupId': 3926, 'jobIds': [7513, 7514], 'reason': 'JobStarted', 'sessionId': 'b9d327f0-8e19-11e8-8feb-9fabf204e996'}
2018-07-22 18:43:01.427594 InProgress
2018-07-22 18:43:06.791576 Complete





True
#Step 3a: Get a pandas dataframe with the results
df = t.get_dataframe('/trifacta/queryResults/demo@trifacta.com/demo_output.csv')
df
Neighborhood HouseStyle row_count sum_LotArea
0 NAmes 1Story 159 1589811
1 CollgCr 1Story 91 841644
2 Gilbert 2Story 60 668112
3 Timber 1Story 23 554694
4 CollgCr 2Story 53 546602
5 NridgHt 1Story 51 537687
6 Sawyer 1Story 53 528438
7 Edwards 1Story 53 511296
8 NoRidge 2Story 33 485691
9 NWAmes 1Story 35 403813
10 ClearCr 1Story 11 395797
11 Mitchel 1Story 32 394436
12 Somerst 1Story 37 350820
13 NWAmes 2Story 29 348885
14 Somerst 2Story 49 323495
15 NridgHt 2Story 26 300685
16 OldTown 2Story 32 274465
17 SawyerW 1Story 28 271008
18 OldTown 1.5Fin 33 267283
19 ClearCr 1.5Fin 6 266593
20 Crawfor 1Story 19 260639
21 SawyerW 2Story 25 255102
22 NAmes 2Story 22 249793
23 OldTown 1Story 33 240257
24 Edwards 1.5Fin 22 228970
25 Crawfor 2Story 20 222029
26 NAmes SLvl 21 221177
27 Edwards 2Story 14 185799
28 Timber 1.5Fin 2 178418
29 BrkSide 1.5Fin 25 172233
... ... ... ... ...
66 CollgCr SLvl 3 30135
67 BrDale 2Story 16 28816
68 Veenker SLvl 2 25757
69 NoRidge 1.5Fin 2 25398
70 SawyerW SFoyer 3 25267
71 CollgCr SFoyer 3 24491
72 MeadowV 2Story 8 19611
73 NPkVill 1Story 4 17942
74 Veenker 2Story 1 17542
75 NAmes 1.5Unf 2 16827
76 SWISU 1Story 2 14692
77 OldTown SFoyer 2 14179
78 NWAmes 1.5Fin 1 13837
79 SawyerW SLvl 1 12800
80 IDOTRR 1.5Unf 2 12449
81 SawyerW 1.5Fin 1 12327
82 Gilbert 1.5Fin 1 12134
83 BrkSide 2.5Unf 1 11888
84 Crawfor 2.5Fin 1 11526
85 NPkVill 2Story 5 11465
86 NWAmes SFoyer 1 10625
87 Crawfor 1.5Unf 1 10594
88 OldTown 1.5Unf 2 9888
89 MeadowV SFoyer 6 9853
90 SawyerW 1.5Unf 1 9000
91 MeadowV 1Story 2 8448
92 IDOTRR 2.5Unf 1 7200
93 Crawfor 2.5Unf 1 7128
94 Blueste 2Story 2 3250
95 MeadowV SLvl 1 1596

96 rows × 4 columns

#Step 3b: Download results as text/csv
file_contents = t.get_file_contents('/trifacta/queryResults/demo@trifacta.com/demo_output.csv')
with open('demo_output.csv', 'w') as f:
    f.write(file_contents)
#Show the first few rows of the CSV file
!head demo_output.csv
"Neighborhood","HouseStyle","row_count","sum_LotArea"
"NAmes","1Story","159","1589811"
"CollgCr","1Story","91","841644"
"Gilbert","2Story","60","668112"
"Timber","1Story","23","554694"
"CollgCr","2Story","53","546602"
"NridgHt","1Story","51","537687"
"Sawyer","1Story","53","528438"
"Edwards","1Story","53","511296"
"NoRidge","2Story","33","485691"
#Step 4: Upload files to Trifacta
t.put_file_contents('/trifacta/uploads/demo_output.csv', file_contents)
True

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trifacta-2.3.tar.gz (6.4 kB view hashes)

Uploaded Source

Built Distribution

trifacta-2.3-py3-none-any.whl (5.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page