Skip to main content

A python module for TM1 Bedrock.

Project description

Project Banner

TM1 Data Integration and Automation Toolkit

This project is a Python-based toolkit by Knowledgeseed, designed to streamline data integration and automation tasks with IBM Planning Analytics (TM1). It leverages the TM1py library to provide a high-level, configurable, and extensible framework for moving data between TM1 cubes, SQL databases, and CSV files. The toolkit is built with a focus on performance, offering features like asynchronous operations and detailed logging for debugging and optimization.

For further information about the project, please refer to https://knowledgeseed.ch/blog/tmbedrockpy_python_library.

The detailed documentation of the project is available at https://tm1-bedrock-py.readthedocs.io/en/latest/.

Features

  • Versatile Data Flows:
    • Copy data between TM1 cubes (inter-cube and intra-cube).
    • Load data from SQL databases into TM1 cubes.
    • Export data from TM1 cubes to SQL tables.
    • Load data from CSV files into TM1 cubes.
    • Export data from TM1 cubes to CSV files.
  • Powerful Data Transformation:
    • Apply a series of mapping and transformation steps to your data in-flight.
    • Support for shared and step-specific mapping rules.
    • Methods for replacing, mapping and replacing, and mapping and joining data based on dimension values.
    • Ability to redimension data by adding, removing, or renaming columns.
    • Transform values using custom functions.
  • Performance-Oriented:
    • Asynchronous data writing for improved performance on large datasets.
    • Asynchronous execution of multiple data copy operations in parallel.
    • Detailed performance metrics logging to identify bottlenecks.
  • Robust and Debuggable:
    • Comprehensive logging framework with configurable verbosity.
    • Option to log intermediate DataFrames to CSV files for detailed debugging of transformation steps.
    • Validation utilities to ensure data integrity.
  • Extensible:
    • Use custom functions for data extraction, transformation, and loading to fit your specific needs.

How to Use

This toolkit is designed to be used as a library within your Python scripts for TM1 automation. The core functionality is exposed through a set of high-level functions in the bedrock.py module.

Core Functions

  • data_copy_intercube: Copies data from a source TM1 cube to a target TM1 cube.
  • data_copy: Copies and transforms data within the same TM1 cube.
  • load_sql_data_to_tm1_cube: Loads data from a SQL query or table into a TM1 cube.
  • load_tm1_cube_to_sql_table: Exports data from a TM1 cube to a SQL table.
  • load_csv_data_to_tm1_cube: Loads data from a CSV file into a TM1 cube.
  • load_tm1_cube_to_csv_file: Exports data from a TM1 cube to a CSV file.
  • async_executor_tm1: Asynchronously executes multiple data_copy, data_copy_intercube, load_tm1_cube_to_sql_table or load_csv_data_to_tm1_cube operations.
  • async_executor_tm1_to_sql: Asynchronously executes multiple load_tm1_cube_to_sql_table operations.
  • async_executor_csv_to_tm1: Asynchronously executes multiple load_csv_data_to_tm1_cube operations.

Create a connection in TM1 with at least the following parameters set:

  • address
  • user
  • password
  • port
  • ssl

You can check your connection via running example/check_connectivity.py. You can configure your connection in examples/config_example.ini or if left empty, via user input from the terminal.

Example: Copying Data Between Cubes with Transformations

Here is an example of how to use data_copy_intercube to copy data from a source cube to a target cube while applying some transformations. For further examples, please refer to tests/test_integration.py and tests/test_benchmark.py.

from TM1py import TM1Service
from TM1_bedrock_py.bedrock import data_copy_intercube

# --- 1. Connect to TM1 ---
# Establish your TM1Service connection
tm1 = TM1Service(address='localhost', port=8001, user='admin', password='apple', ssl=False)

# --- 2. Define the Data Copy Parameters ---
# MDX to select the source data
source_mdx = """
SELECT
{ [Version].[Actual] } ON COLUMNS,
{ [Year]. } ON ROWS
FROM [Sales]
WHERE ([Measures].[Amount])
"""

# Define mapping and transformation steps
mapping_steps = [
    {
        "method": "replace",
        "mapping": {
            "Region": {"Old Region": "New Region"}
        }
    },
    {
        "method": "map_and_replace",
        "mapping_mdx": "SELECT {[Product].[All Products].Children} ON COLUMNS FROM [Product]",
        "mapping_dimensions": {
            "Product": "Product"
        }
    }
]

# --- 3. Execute the Data Copy ---
data_copy_intercube(
    tm1_service=tm1,
    data_mdx=source_mdx,
    target_cube_name="Sales Target",
    mapping_steps=mapping_steps,
    clear_target=True,
    logging_level="INFO"
)

Configuration of Mapping Steps

The mapping_steps parameter is a powerful feature that allows you to define a pipeline of transformations. Each step is a dictionary that specifies the transformation method and its parameters.

  • replace: Performs a simple find-and-replace on dimension elements.
  • map_and_replace: Uses a mapping DataFrame (from an MDX query, SQL, or CSV) to replace values in specified dimensions.
  • map_and_join: Joins columns from a mapping DataFrame to the main data based on shared dimensions.

Optimizing Performance

This toolkit provides several ways to optimize the performance of your data operations:

  • Asynchronous Operations: For large data writes to TM1, set the async_write=True parameter in the core functions. This will write data in parallel, significantly reducing the overall execution time.
  • Asynchronous Executors: When you need to run multiple independent data copy processes, use the async_executor_ functions. These functions will execute the operations concurrently, making efficient use of available resources.
import asyncio
from TM1py import TM1Service
from TM1_bedrock_py.bedrock import async_executor_tm1, data_copy_intercube

tm1 = TM1Service(address='localhost', port=8001, user='admin', password='apple', ssl=False)

asyncio.run(async_executor_tm1(
            data_copy_function=data_copy_intercube,
            tm1_service=tm1,
            data_mdx_template=data_mdx_template,
            skip_zeros=True,
            skip_consolidated_cells=True,
            target_cube_name=target_cube_name,
            shared_mapping=shared_mapping,
            mapping_steps=mapping_steps,
            clear_target=True,
            async_write=True,
            logging_level="DEBUG",
            use_blob=True,
            param_set_mdx_list=param_set_mdx_list,
            clear_param_templates=clear_param_templates,
            ignore_missing_elements=True,
            max_workers=8
        ))

IMPORTANT NOTES ON PERFORMANCE

  • skip_zeros: When extracting data from TM1, setting skip_zeros=True can reduce the amount of data transferred and processed, especially for sparse cubes.
  • use_blob: Default value is use_blob=False as True needs administrator privileges. Setting the value to True improves performance significantly.
  • slice_size_of_dataframe: Default value is slice_size_of_dataframe=5000.
  • Efficient MDX: The performance of the entire process is heavily dependent on the efficiency of your MDX queries. Ensure your MDX is optimized for the source cube structure.

Debugging Options

The toolkit includes a robust logging framework to help you debug your data integration processes.

  • Logging Level: You can control the verbosity of the logs by setting the logging_level parameter in the core functions. The available levels are DEBUG, INFO, WARNING, ERROR, and CRITICAL.
  • DataFrame Logger: This is a powerful debugging feature that allows you to inspect the state of your DataFrame at various stages of the transformation pipeline. To enable it, set df_verbose_logging=True in the core functions. This will save a CSV file of the DataFrame at each significant step (e.g., after each mapping step) into a dataframe_logs directory. This is invaluable for understanding how your data is being transformed and for troubleshooting mapping issues.
  • Performance Metrics: The execution time of key functions is automatically logged. By setting the logging level to DEBUG, you can see the time taken for each major operation, which helps in identifying performance bottlenecks. The logging.json file can be configured to output these metrics to a separate file for analysis.

Requirements

  • TM1py >=2.1, <3.0
  • pandas >=2.3.3, <3.0.0
  • json_logging >=1.3.0, <2.0.0
  • sqlalchemy >=2.0.0, <3.0.0
  • pyodbc >=5.2.0, <6.0.0
  • pyyaml >=6.0, <7.0
  • openpyxl >=3.1.0

Installation

Install without airflow

pip install tm1-bedrock-py

Install with airflow

As of now the airflow_executor sub-package of tm1-bedrock-py only supports apache-airflow for versions >=2.4.0, <=2.11.0, with Python versions 3.9 to 3.12.

pip install tm1-bedrock-py[airflow]

Additional options for airflow database connectors for Microsoft-MSSQL and Postgres

pip install tm1-bedrock-py[airflow, microsoft-mssql]
pip install tm1-bedrock-py[airflow, postgres]

Development

Windows

git clone https://github.com/KnowledgeSeed/tm1_bedrock_py.git
cd tm1_bedrock_py

python -m venv .env
.\.env\Scripts\activate
pip install -r requirements-dev.txt
python -m build

Linux/macOS

git clone https://github.com/KnowledgeSeed/tm1_bedrock_py.git
cd tm1_bedrock_py

virtualenv .env
source .env/bin/activate
pip install -r requirements-dev.txt
python -m build

Manual integration testing

Use tests_integration/docker-compose.yaml as a baseline, which spins up a TM1 provider and a base TM1 database and a PostgreSQL database to test against. Please note that tm1-docker image is properietary IBM product wrapped in Docker by Knowledgeseed and therefore it is only available internally for Knowledgeseed developers.

docker compose up -d tm1 test_db_postgres

To obtain a licensed IBM TM1 database for testing or production purpose, please see https://www.ibm.com/topics/tm1 for further details.

License

See LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tm1_bedrock_py-1.2.2.tar.gz (93.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tm1_bedrock_py-1.2.2-py3-none-any.whl (85.8 kB view details)

Uploaded Python 3

File details

Details for the file tm1_bedrock_py-1.2.2.tar.gz.

File metadata

  • Download URL: tm1_bedrock_py-1.2.2.tar.gz
  • Upload date:
  • Size: 93.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for tm1_bedrock_py-1.2.2.tar.gz
Algorithm Hash digest
SHA256 0a68ffceafa748637bda6202398fdf9d82c733bc087122163d385860d2919fe2
MD5 6c3f2a86729bc48e0b3473897f025b9d
BLAKE2b-256 8c2e166db7c666bf12b45f73458181df3df7988e790a49feb41b6586cf8a448d

See more details on using hashes here.

File details

Details for the file tm1_bedrock_py-1.2.2-py3-none-any.whl.

File metadata

  • Download URL: tm1_bedrock_py-1.2.2-py3-none-any.whl
  • Upload date:
  • Size: 85.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for tm1_bedrock_py-1.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b5c390baea164341858e13ec890ec9e1f507b145b426cf2e7922151ee92bbff5
MD5 ded2b0ffdaa6af5e9cb54fe3b579faf6
BLAKE2b-256 38fa3ee20bb7ff03812ab20bd408fe506d722972d40f337bc75251e8468c965e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page