Skip to main content

Connect to your tabular model and perform operations programmatically

Project description

PyTabular

PyPI version Downloads readthedocs pages-build-deployment flake8

What is it?

PyTabular (python-tabular in pypi) is a python package that allows for programmatic execution on your tabular models! This is possible thanks to Pythonnet and Microsoft's .Net APIs on Azure Analysis Services. Currently, this build is tested and working on Windows Operating System only. Help is needed to expand this for other operating systems. The package should have the dll files included when you import it. See Documentation Here. PyTabular is still considered alpha while I'm working on building out the proper tests and testing environments, so I can ensure some kind of stability in features. Please send bugs my way! Preferably in the issues section in Github. I want to harden this project so many can use it easily. I currently have local pytest for python 3.6 to 3.10 and run those tests through a local AAS and Gen2 model.

Getting Started

See the Pypi project for available version.

python3 -m pip install python-tabular

In your python environment, import pytabular and call the main Tabular Class. Only parameter needed is a solid connection string.

import pytabular
model = pytabular.Tabular(CONNECTION_STR)

I'm a big fan of logging, if you don't want any just get the logger and disable it.

import pytabular
pytabular.logger.disabled = True

You can query your models with the Query method from your tabular class. For Dax Queries, it will need the full Dax syntax. See EVALUATE example. This will return a Pandas DataFrame. If you are looking to return a single value, see below. Simply wrap your query in the the curly brackets. The method will take that single cell table and just return the individual value. You can also query your DMV. See below for example. See PyTabular Docs for Query.

#Run basic queries
DAX_QUERY = "EVALUATE TOPN(100, 'Table1')"
model.Query(DAX_QUERY) #returns pd.DataFrame()

#or...
DMV_QUERY = "select * from $SYSTEM.DISCOVER_TRACE_EVENT_CATEGORIES"
model.Query(DMV_QUERY) #returns pd.DataFrame()

#or...
SINGLE_VALUE_QUERY_EX = "EVALUATE {1}"
model.Query(SINGLE_VALUE_QUERY_EX) #returns 1

#or...
FILE_PATH = 'C:\\FILEPATHEXAMPLE\\file.dax' #or file.txt
model.Query(FILE_PATH) #Will return same logic as above, single values if possible else will return pd.DataFrame()

You can also explore your tables, partitions, and columns. Via the Attributes from your Tabular class.

#Explore tables...
dir(model.Tables['Table Name'])

#Explore columns & partitions
dir(model.Tables['Table Name'].Partitions['Partition Name'])

#Only a few features right now, but check out the built in methods.
model.Tables['Table Name'].Refresh(Tracing = True)
#or
model.Tables['Table Name'].Partitions['Partition Name'].Refresh(Tracing = True)
#or
model.Tables['Table Name'].Partitions['Partition Name'].Last_Refresh()
#or
model.Tables['Table Name'].Row_Count()
#or
model.Tables['Table Name'].Columns['Column Name'].Distinct_Count()

Refresh method to handle refreshes on your model. This is synchronous. Should be flexible enough to handle a variety of inputs. See PyTabular Docs for Refreshing Tables and Partitions. Most basic way to refresh is input the table name string. The method will search for table and output exeption if unable to find it. For partitions you will need a key, value combination. Example, {'Table1':'Partition1'}. You can also take the key value pair and iterate through a group of partitions. Example, {'Table1':['Partition1','Partition2']}. Rather than providing a string, you can also input the actual class. See below for those examples, and you can acess them from the built in attributes self.Tables, self.Partitions or explore through the .Net classes yourself in self.Model.Tables.

#You have a few options when refreshing. 
model.Refresh('Table Name')

#or...
model.Refresh(['Table1','Table2','Table3'])

#or...
model.Refresh(<Table Class>)

#or...
model.Refresh(<Partition Class>)

#or...
model.Refresh({'Table Name':'Partition Name'})

#or any kind of weird combination like
model.Refresh([{<Table Class>:<Partition Class>,'Table Name':['Partition1','Partition2']},'Table Name','Table Name2'])

#You can even run through the Tables & Partition Attributes
model.Tables['Table Name'].Refresh()

#or
model.Tables['Table Name'].Partitions['Partition Name'].Refresh()

#Default Tracing happens automatically, but can be removed by -- 
model.Refresh(['Table1','Table2'], Tracing = None)

It's not uncommon to need to run through some checks on specific Tables, Partitions, Columns, Etc...

#Get Row Count from model
model.Tables['Table Name'].Row_Count()

#Get Last Refresh time from a partition
model.Tables['Table Name'].Last_Refresh()

#Get Distinct Count or Values from a Column
model.Tables['Table Name'].Columns['Column Name'].Distinct_Count()
#or
model.Tables['Table Name'].Columns['Column Name'].Values()

Use Cases

If blank table, then refresh table.

This will use the function Return_Zero_Row_Tables and the method Refresh from the Tabular class.

import pytabular
model = pytabular.Tabular(CONNECTION_STR)
tables = model.Tables.Find_Zero_Rows()
if len(tables) > 0:
    model.Refresh(tables)

Sneak in a refresh.

This will use the method Is_Process and the method Refresh from the Tabular class. It will check the DMV to see if any jobs are currently running classified as processing.

import pytabular
model = pytabular.Tabular(CONNECTION_STR)
if model.Is_Process():
    #do what you want if there is a refresh happening
else:
    model.Refresh(TABLES_OR_PARTITIONS_TO_REFRESH)

Show refresh times in model.

This will use the function Table_Last_Refresh_Times and the method Create_Table from the Tabular class. It will search through the model for all tables and partitions and pull the 'RefreshedTime' property from it. It will return results into a pandas dataframe, which will then be converted into an M expression used for a new table.

import pytabular
model = pytabular.Tabular(CONNECTION_STR)
df = model.Tables.Last_Refresh()
model.Create_Table(df, 'Refresh Times')

If BPA Violation, then revert deployment.

Uses a few things. First the BPA Class, then the TE2 Class, and will finish with the Analyze_BPA method. Did not want to re-invent the wheel with the amazing work done with Tabular Editor and it's BPA capabilities.

import pytabular
model = pytabular.Tabular(CONNECTION_STR)
TE2 = pytabular.Tabular_Editor() #Feel free to input your TE2 File path or this will download for you.
BPA = pytabular.BPA() #Fee free to input your own BPA file or this will download for you from: https://raw.githubusercontent.com/microsoft/Analysis-Services/master/BestPracticeRules/BPARules.json
results = model.Analyze_BPA(TE2.EXE,BPA.Location)

if len(results) > 0:
    #Revert deployment here!

Loop through and query Dax files

Let's say you have multiple dax queries you would like to store and run through as checks. The Query method on the Tabular class can also take file paths. Can really be any file type as it's just checking os.path.isfile(). But would suggest .dax or .txt. It will read the file that use that as the new Query_str argument.

import pytabular
model = pytabular.Tabular(CONNECTION_STR)
LIST_OF_FILE_PATHS = ['C:\\FilePath\\file1.dax','C:\\FilePath\\file1.txt','C:\\FilePath\\file2.dax','C:\\FilePath\\file2.txt']
for file_path in LIST_OF_FILE_PATHS:
    model.Query(file_path)

Advanced Refreshing with Pre and Post Checks

Maybe you are introducing new logic to a fact table, and you need to ensure that a measure checking last month values never changes. To do that you can take advantage of the Refresh_Check and Refresh_Check_Collection classes (Sorry, I know the documentation stinks right now). But using those you can build out something that would first check the results of the measure, then refresh, then check the results of the measure after refresh, and lastly perform your desired check. In this case the pre value matches the post value. When refreshing and your pre does not equal post, it would fail and give an assertion error in your logging.

from pytabular import Tabular
from pytabular.refresh import Refresh_Check, Refresh_Check_Collection

model = Tabular(CONNECTION_STR)

# This is our custom check that we want to run after refresh.
# Does the pre refresh value match the post refresh value.
def sum_of_sales_assertion(pre, post):
    return pre == post

# This is where we put it all together into the `Refresh_Check` class. Give it a name, give it a query to run, and give it the assertion you want to make.
sum_of_last_month_sales = Refresh_Check(
    'Last Month Sales',
    lambda: model.Query("EVALUATE {[Last Month Sales]}")
    ,sum_of_sales_assertion
)

# Here we are adding it to a `Refresh_Check_Collection` because you can have more than on `Refresh_Check` to run.
all_refresh_check = Refresh_Check_Collection([sum_of_last_month_sales])

model.Refresh(
    'Fact Table Name',
    refresh_checks = Refresh_Check_Collection([sum_of_last_month_sales])
    
)

Query as Another User

There are plenty of tools that allow you to query as an 'Effective User' inheriting their security when querying. This is an extremely valuable concept built natively into the .Net apis. My only gripe is they were all UI based. This allows you to programmatically connect as an effective user and query in Python. You could easily loop through all your users to run tests on their security.

import pytabular as p

#Connect to your model like usual...
model = p.Tabular(CONNECTION_STR)

#This will be the query I run...
query_str = '''
EVALUATE
SUMMARIZE(
    'Product Dimension',
    'Product Dimension'[Product Name],
    "Total Product Sales", [Total Sales]
)
'''
#This will be the user I want to query as...
user_email = 'user1@company.com'

#Base line, to query as the user connecting to the model.
model.Query(query_str)

#Option 1, Connect via connection class...
user1 = p.Connection(model.Server, Effective_User = user_email)
user1.Query(query_str)

#Option 2, Just add Effective_User
model.Query(query_str, Effective_User = user_email)

#PyTabular will do it's best to handle multiple accounts...
#So you won't have to reconnect on every query

Refresh Related Tables

Ever need to refresh related tables of a Fact? Now should be a lot easier.

import pytabular as p

#Connect to model
model = p.Tabular(CONNECTION_STR)

#Get related tables
tables = model.Tables[TABLE_NAME].Related()

#Now just refresh like usual...
tables.Refresh()

Contributing

See CONTRIBUTING.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python_tabular-0.3.0.tar.gz (2.3 MB view hashes)

Uploaded Source

Built Distribution

python_tabular-0.3.0-py3-none-any.whl (2.3 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page