Library for manipulating ThoughtSpot Modeling Language (TML) files
Project description
ThoughtSpot TML
a Python package for working with ThoughtSpot Modeling Language (TML) files programmatically
Installation | Example | Migration to v2.0.0 | Reference | Notes | Contributing
🚨 If your examples or scripts are built on thoughtspot_tml==1.3.0, see our Migration to v2.0.0 guide. 🚨
Features
- Supports: Connections, Tables, Views, SQLViews, Worksheets, Answers, Liveboards
- Deep attribute access
- Roundtripping from text or file
- Utilities for disambiguation workflows
This package will not perform validation of the constructed TML files or interact with your ThoughtSpot cluster!
Please leverage the ThoughtSpot REST API for this purpose.
Installation
thoughtspot_tml requires at least Python 3.7, preferably Python 3.9 and above.
Installation is as simple as:
pip install thoughtspot-tml
Upgrade library after ThoughtSpot is upgraded
If you have errors using the .load() method on a TML file directly exported with no modification from ThoughtSpot, please upgrade to the latest thoughtpot_tml version.
New attributes may have been added to the TML specification that are not present in the older version of thoughtspot_tml library you have installed.
A Basic Example
This example creates a command-line tool for changing the prefix in the names of the Table objects that a Worksheet object connects to.
# worksheet_remapping.py
from thoughtspot_tml import Worksheet
import argparse
import pathlib
def filepath(fp: str) -> pathlib.Path:
"""
Converts a string to a pathlib.Path.
"""
path = pathlib.Path(fp)
if not path.exists():
raise argparse.ArgumentTypeError(f"path '{fp!r}' does not exist")
if not path.is_file():
raise argparse.ArgumentValueError(f"path must be a file, got '{fp!r}'")
return path
def main():
# Create a command line application
# - argument for a WORKSHEET.worksheet.tml
# - options for the "before" and "after" tabling naming conventions
parser = argparse.ArgumentParser()
parser.add_argument("worksheet_tml", help="a worksheet.tml to remap", type=filepath)
parser.add_argument("-s", "--src-prefix", metavar="SRC", default="DEV_", type=str, help="(default: %(default)s)")
parser.add_argument("-d", "--dst-prefix", metavar="DST", default="TEST_", type=str, help="(default: %(default)s)")
# Parse CLI input
args = parser.parse_args()
# Read from file
tml = Worksheet.load(args.worksheet_tml)
# Replace instances of DEV_ with TEST_
for table in tml.worksheet.tables:
table.name = table.name.replace(args.src_prefix, args.dst_prefix)
# Save to file
tml.dump(args.worksheet_tml)
if __name__ == '__main__':
raise SystemExit(main())
>>> python worksheet_remapping.py -h
usage: [-h] [-s SRC] [-d DST] worksheet_tml
positional arguments:
worksheet_tml a worksheet.tml to remap
options:
-h, --help show this help message and exit
-s SRC, --src-prefix SRC (default: DEV_)
-d DST, --dst-prefix DST (default: TEST_)
A more complex version of this example, as well as more examples can be found in the /examples directory in this repository.
thoughtspot_tml Reference
TML Objects | Deserialization | Serialization | SpotApp | Utilities
TML Objects
from thoughtspot_tml import Table, View, SQLView, Worksheet
from thoughtspot_tml import Answer, Liveboard
# aliases
from thoughtspot_tml import ThoughtSpotView # View
from thoughtspot_tml import SavedAnswer # Answer
from thoughtspot_tml import SystemTable # Table
Each TML object has a top-level attribute for the globally unique identifier, or GUID, as well as the document form of the object it represents. This identically mirrors the TML specification you can find in the ThoughtSpot documentation. In addition, the name attribute of the TML document itself has been pulled into the top-level namespace.
@dataclass
class Worksheet(TML):
"""
Representation of a ThoughtSpot Worksheet TML.
"""
guid: GUID
worksheet: WorksheetEDocProto
@property
def name(self) -> str:
return self.worksheet.name
The full, composable TML specification can found in _scriptability.py. Each piece of the spec is a python dataclasses.dataclass field. The internal _scriptability.py module is generated code from the ThoughtSpot's internal architecture and allows for thoughtspot_tml to offer the deep attribute access experience in python.
@dataclass
class Table(TML):
"""
Representation of a ThoughtSpot Table TML.
"""
guid: GUID
table: LogicalTableEDocProto
@property
def name(self) -> str:
return self.table.name
For example, interesting attributes about the Table TML spec are exposed via attributes which can, in turn expose their own attributes themselves. This functionality offers common pattersn to be expressed natively in Python, such as remapping a Table's connection details.
tml = Table.load("tests/data/DUMMY.table.tml")
# get the Table document object
tml.table # => LogicalTableEdocProto(...)
# get the Table's underlying connected details
tml.table.db # => 'PMMDB'
tml.table.schema # => 'RETAILAPPAREL'
tml.table.db_table # => 'dim_retapp_products'
# get the Table's columns
tml.table.columns # => [LogicalTableEDocProtoLogicalColumnEDocProto(...), ...]
# repoint this ThoughtSpot Table to a new external table
tml.table.schema = "RETAILAPPAREL_V2"
tml.table.db_table = "DIM_RETAPP_PRODUCTS"
Connections (also known as "Embrace" Connections) were implemented prior to the TML spec being officially released. The remapping file (connection.yaml), obtained from your platform at Data > Connections > (...) in the top right > Remapping > Download defines how ThoughtSpot table objects relate to their external counterparts.
from thoughtspot_tml import Connection
# aliases
from thoughtspot_tml import EmbraceConnection # Connection
The Connection GUID, while optional in thoughtspot_tml, is required when modifying or removing an existing connection via the REST API. A Connection's GUID can be obtained by calling the connection/list endpoint.
When loading from a connection.yaml file, if thoughtspot_tml identifies the filename is a GUID, then the property will be set on the resulting object.
The connection/update REST API endpoint requires connections to formatted in a different way. For this, we provide a method to generate the metadata parameter data, which is a mapping of configuration attributes, as well as database, schema, and table objects.
@dataclass
class Connection(TML):
"""
Representation of a ThoughtSpot Connection YAML.
"""
guid: Optional[GUID]
connection: ConnectionDoc
def to_rest_api_v1_metadata(self) -> ConnectionMetadata:
...
Each object contains multiple methods for serialization and deserialization.
Deserialization
For deserialization of a TML document into a python object.
ws = Worksheet.load(path: PathLike = "tests/data/DUMMY.worksheet.tml")
ws = Worksheet.loads(tml_document: str = ...) # can be obtained from the ThoughtSpot REST API
ws.guid == "2ea7add9-0ccb-4ac1-90bb-231794ebb377"
.load a worksheet from a .worksheet.tml file, or as a string directly from the metadata/tml/export API with .loads.
Serialization
For serialization of a TML python object back into data.
data = ws.to_dict()
data["guid"] == "2ea7add9-0ccb-4ac1-90bb-231794ebb377"
ws.dump(path="tests/data/DUMMY.worksheet.tml")
# DUMMY.worksheet.tml
#
# guid: 2ea7add9-0ccb-4ac1-90bb-231794ebb377
# worksheet:
# ...
data_s = ws.dumps(format_type="YAML")
data = yaml.load(data_s)
data["guid"] == "2ea7add9-0ccb-4ac1-90bb-231794ebb377"
# -or-
data = ws.dumps(format_type="JSON")
data_s = json.loads(data_s)
data["guid"] == "2ea7add9-0ccb-4ac1-90bb-231794ebb377"
.to_dict to convert the entire object tree into python native types, or write back to a file with .dump as a TML-formatted string. The formatting can be overriden to JSON if the JSON file type is used (.worksheet.json). .dumps allows access to the formatted string directly, typically used as input for the metadata/tml/import API.
SpotApp
from thoughtspot_tml import SpotApp
SpotApps are bundles of TML which can be obtained directly from the ThoughtSpot user interace as a zip file archive or from the /metadata/tml/export API endpoint using the export_associated = true query parameter.
export_response = ... # /metadata/tml/export
s = SpotApp.from_api(export_response)
print(s.tml) # => [Worksheet(...), Table(...), Table(...)]
print(s.manifest) # => Manifest(...)
# -or-
s = SpotApp.read("tests/data/DUMMY_spot_app.zip")
print(s.tml) # => [Worksheet(...), Table(...), Table(...)]
print(s.manifest) # => Manifest(...)
SpotApps can also be saved to a new zipfile archive through the .save method.
s = SpotApp.read("tests/data/DUMMY_spot_app.zip")
s.save("tests/data/NEW_DUMMY_spot_app.zip")
Utilities
determine_tml_type | EnvironmentGUIDMapper | disambiguate
thoughtspot_tml.utils are additional methods which can help or speed up working with TML documents.
determine_tml_type
TML is both a data structure and file format, and these formats vary slightly across each document. determine_tml_type will return the appropriate TML class so that you can call deserialization methods directly. Pass either the path keyword with a filepath, or the file info directly from one of the objects returned in the /metadata/tml/export response data.
signature
def determine_tml_type(*, info: TMLDocInfo = None, path: PathLike = None) -> Union[Connection, TMLObject]:
"""
Get the appropriate TML class based on input data.
Parameters
----------
info : TMLDocInfo
API edoc info response
path : PathLike
filepath to parse
Raises
------
TMLError, when a valid TML type could not be found based on input
"""
usage
from thoughtspot_tml.utils import determine_tml_type
tml_cls = determine_tml_type(path="/tests/data/DUMMY.worksheet.tml")
tml = tml_cls.load(path="/tests/data/DUMMY.worksheet.tml")
type(tml) is Worksheet # => True
# -or-
export_response = ... # /metadata/tml/export
tml_cls = determine_tml_type(info=export_response["object"][0]["info"])
tml = tml_cls.loads(tml_document=export_response["object"][0]["edoc"])
type(tml) is Worksheet # => True
EnvironmentGUIDMapper
The EnvironmentGUIDMapper is a dictionary-like data structure which can help you maintain references to objects across your ThoughtSpot environments. The underlying data structure is intended to clearly show the relationship of a given object between any number of environments. An "environment" can be any scope you consider separate from each other, be it 2 ThoughtSpot servers, 2 Connections on the same server, or even "Copy of" the same object within a single Connection.
signature
class EnvironmentGUIDMapper:
"""
Attributes
----------
environment_transformer : Callable(str) -> str
a function which transforms the ENV name before adding it to the mapping
"""
def __init__(self, environment_transformer: Callable[[str], str] = str.upper):
usage
from thoughtspot_tml.utils import EnvironmentGUIDMapper
# create a new mapper
mapper = EnvironmentGUIDMapper() # or EnvironmentGUIDMapper.read(path=...)
# map 3 guids to represent the same ThoughtSpot object across environments
mapper["guid1"] = ("PROD", "guid1") # 1. add a new guid into the mapper
mapper["guid1"] = ("TEST", "guid2") # 2. map guid1 to a guid in another environment
mapper["guid2"] = ("DEV", "guid3") # 3. map a new guid3 to any of existing guid
# persist the mapping file to disk
mapper.save(path="marketing_thoughtspot_guid_mapping.json")
# what's the JSON data structure look like?
print(mapper)
{
"guid1__guid2__guid3": {
"PROD": "guid1",
"TEST": "guid2",
"DEV": "guid3"
}
}
# create a new mapper from a file
new_mapper = EnvironmentGUIDMapper.read(path="marketing_thoughtspot_guid_mapping.json")
# add another object mapping
new_mapper.set("guid10", environment="PROD", guid="guid10") # equivalent to new_mapper["guid10"] = ("PROD", "guid10")
new_mapper.set("guid10", environment="TEST", guid="guid11")
new_mapper.set("guid10", environment="DEV", guid="guid12")
# get all the environments that would map to "guid10"
print(new_mapper["guid10"]) # or new_mapper.get("guid10")
{
"PROD": "guid10",
"TEST": "guid11",
"DEV": "guid12"
}
# get a mapping of all DEV -> PROD related ThoughtSpot objects
print(new_mapper.generate_mapping(from_environment="DEV", to_environment="PROD"))
{
"guid3": "guid1",
"guid12": "guid10"
}
disambiguate
In ThoughtSpot, the uniqueness constraint exists on the underlying object's guid. This means that there can be multiple objects of the same type with the same name. An example of this is maintaining both a DEV and PROD Connection. All the development work happens on one set of objects (that are not shared with any of the End User community), while the production connection contains objects with identical names that are shared with the End User community.
To reduce ambiguity, you may need to add the fqn key to your TML document when you reference source tables or connections. If you do not add the fqn key, and the connection or table you reference does not have a unique name, the import will fail.
NOTE: Prior to ThoughtSpot V8.7.0, TML does not export with the fqn automatically.
signature
def disambiguate(
tml: TMLObject,
*,
guid_mapping: Dict[str, GUID],
remap_object_guid: bool = True,
delete_unmapped_guids: bool = False,
) -> TMLObject:
"""
Deep scan the TML looking for fields to add FQNs to.
This will explore the top-level guid and all nested objects looking on
Tables, Worksheets, etc to disambiguate.
Parameters
----------
tml : TMLObject
the tml to scan
guid_mapping : {str: GUID}
a mapping of names or guids, to the FQN to add to the object
remap_object_guid : bool = True
whether or not to remap the tml.guid
delete_unmapped_guids : bool = False
if a match could not be found, set the FQN and object guid to None
"""
usage
from thoughtspot_tml.utils import disambiguate
from thoughtspot_tml import Worksheet
# Load a Worksheet and check its data
ws = Worksheet.load("tests/data/DUMMY.worksheet.tml")
ws.guid == "2ea7add9-0ccb-4ac1-90bb-231794ebb377" # => True
ws.worksheet.tables[0].name == "dim_retapp_products" # => True
ws.worksheet.tables[0].fqn is None # => True
# Assign a Table an FQN. This information can be retrieved from ThoughtSpot REST API metadata/list.
ws = disambiguate(ws, guid_mapping={"dim_retapp_products": "7fd39fdb-9dfe-4954-b5dd-9a5d846085b0"})
ws.worksheet.tables[0].fqn is None # => False
ws.worksheet.tables[0].fqn == "7fd39fdb-9dfe-4954-b5dd-9a5d846085b0" # => True
# Re-assign the GUID to a new environment.
ws = disambiguate(ws, guid_mapping={"7fd39fdb-9dfe-4954-b5dd-9a5d846085b0": "99999999-9999-4999-9999-999999999999"})
ws.worksheet.tables[0].fqn == "7fd39fdb-9dfe-4954-b5dd-9a5d846085b0" # => True
ws.worksheet.tables[0].fqn == "99999999-9999-4999-9999-999999999999" # => False
# Remove GUIDs which aren't found in the mapping, including the top-level GUID.
ws = disambiguate(ws, guid_mapping={}, delete_unmapped_guids=True)
ws.worksheet.tables[0].name == "dim_retapp_products" # => True
ws.worksheet.tables[0].fqn is None # => True
ws.guid is None # => True
The disambiguate function will walk through the thoughtspot_tml TML object specifying the .fqn based on keys in the guid_mapping dictionary.
The guid_mapping will typically be a mapping of GUIDs between 2 environments, but the "before" environment can be any string. This can be helpful to quickly add fqn to any object which has yet to define it.
The remap_object_guid (default: True) will consider the top-level TML.guid as a candidate for re-mapping.
The delete_unmapped_guids (default: False) will remove any .fqns which are not found in the guid_mapping.
Migration to v2.0.0
With V2.0.0, we now programmatically build the TML spec from the underlying microservice's data structure. The largest benefit of this move is that we can now
Round-tripping to File
The utility class YAMLTML has been replaced with utils.determine_tml_type and a private base class TML, which all public metadata objects inherit from. The TML type which is returned has the appropriate [de]serialization methods.
Both of the following patterns represent round-tripping.
import pathlib
worksheet_fp = "tests/data/DUMMY.worksheet.tml"
worksheet_tml_str = pathlib.Path(worksheet_fp).read_text()
# V1.3.0
from thoughtspot_tml import YAMLTML
tml = YAMLTML.get_tml_object(worksheet_tml_str)
tml_document_str = YAMLTML.dump_tml_object(tml)
# V2.0.0
from thoughtspot_tml.utils import determine_tml_type
from thoughtspot_tml import Worksheet
tml_cls = determine_tml_type(path=worksheet_fp)
tml = tml_cls.loads(worksheet_tml_str)
# any one of these methods..
# tml = tml_cls.load(worksheet_fp)
# tml = Worksheet.loads(worksheet_tml_str)
# tml = Worksheet.load(worksheet_fp)
tml_document_str = tml.dumps(worksheet_fp)
Identifying the TML Object Type
To identify the type of TML object you are working with in V1.3.0 you would use .content_type, with V2.0.0 you can now use .tml_type_name.
GUID & FQN Handling
In V1.3.0, GUIDs were deleted from the underlying data structure with .remove_guid() in order to ensure the REST API created new objects. With V2.0.0, you simply set the .guid attribute (on the object itself) to None.
# V1.3.0
tml = YAMLTML.get_tml_object(worksheet_tml_str)
tml.remove_guid()
# V2.0.0
tml = Worksheet.loads(worksheet_tml_str)
tml.guid = None
In V1.3.0, each TML object had their own methods for finding and replacing GUIDs. These took the form of .remap_<object_type>_to_new_fqn() and .change_<object_type>_by_fqn(), replacing <object_type> for the underlying data source which maps into the object you're operating on. These methods modify the underlying object.
In V2.0.0, we supply a single method to help add the fqn key to your TML document when referencing source tables or connections that share a name. See disambiguation for additional information.
For example, the below example shows adding the Table FQN references in a Worksheet.
# V1.3.0
name_guid_map = {"Table 1": "0f814ce1-dba1-496a-b3de-38c4b9a288ed", "Table 2": "2e7a0676-2acf-4700-965c-efebf8c0b594"}
tml = YAMLTML.get_tml_object(worksheet_tml_str)
tml.remap_table_to_new_fqn(name_to_fqn_map=name_guid_map)
# - or -
tml.change_table_by_fqn(original_table_name="Table 1", new_table_guid="0f814ce1-dba1-496a-b3de-38c4b9a288ed")
# V2.0.0
from thoughtspot_tml.utils import disambiguate
tml = Worksheet.loads(worksheet_tml_str)
tml = disambiguate(tml, guid_mapping=name_guid_map)
Notes on ThoughtSpot Modeling Language
- TML is implemented in the YAML 1.1 spec.
- When importing a TML file, if the
guidmatches to an existing object in the same Org being uploaded into, then that object will be updated. If theguidis missing or does not match an object in that Org, a new object is created with a new GUID. GUIDs are unique to the entire ThoughtSpot instance - if no GUID match is found in an Org, but the GUID has not been used on the instance, ThoughtSpot will use the GUID provided in the TML file when creating the new object.
Contributing
We welcome all help! :heart: For guidance on setting up a development environment, see our Contributing Guide.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file thoughtspot_tml-2.4.2.tar.gz.
File metadata
- Download URL: thoughtspot_tml-2.4.2.tar.gz
- Upload date:
- Size: 156.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
66f0142527d75a993317ef8bb814395a946eb6219d5cbedc7e322fb545eafd18
|
|
| MD5 |
918a8deae1b1b16cdb331d37570d7923
|
|
| BLAKE2b-256 |
4a53e6772b5468274ced2101049211daea4bc9adc6ae9f1b52f4e7d82331094c
|
File details
Details for the file thoughtspot_tml-2.4.2-py3-none-any.whl.
File metadata
- Download URL: thoughtspot_tml-2.4.2-py3-none-any.whl
- Upload date:
- Size: 46.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
80950e507470632b4f049308edb170fcc6bd2dd658a7bf7e2a0809918b2902a8
|
|
| MD5 |
42c3eb8ec323c9c8d3541e3fa93fb618
|
|
| BLAKE2b-256 |
90edf7a63b4dd2368853f59caaca480a8df6b9d1096203925236fee76f0d8706
|