Skip to main content

sqltest: easy testing ETL sqls

Project description

sqltest

pypi python codecov main pre-commit.ci status codestyle

The sqltest framework makes it easy to write test cases for testing complicated ETL processing logic. What you need to do is prepare your source & target dataset with CSV format or Excel format, and also prepare your ETL SQLs.

Installing

Install and update using pip

$ pip install sqltest

An Simple Example

  1. Prepare your ETL SQL file, for example: spark_etl_demo.sql.
  2. Prepare your source dataset and target dataset, refer Dataset preparation check more detail.
  3. Write your test cases follow by the below examples.
    def test_excel_data_source_demo(self):
        environments = {
            "env": "dev",
            "target_data_path": f"{PROJECT_PATH}/tests/data/tables",
        }

        reader = ExcelDatasetReader(
            data_path=f"{PROJECT_PATH}/tests/data/cases/spark_etl_sql_test_excel_demo/spark_etl_demo.xlsx"
        )
        sql_file_path = f"{PROJECT_PATH}/tests/data/cases/spark_etl_sql_test_excel_demo/spark_etl_demo.sql"

        engine = SparkEngine(SPARK, environments)
        engine.run(reader, sql_file_path)
        engine.verify_target_dataset()

    @excel_reader(
        data_path=f"{PROJECT_PATH}/tests/data/cases/spark_etl_sql_test_excel_demo/spark_etl_demo.xlsx"
    )
    @spark_engine(
        spark=SPARK,
        sql_path=f"{PROJECT_PATH}/tests/data/cases/spark_etl_sql_test_excel_demo/spark_etl_demo.sql",
        env={"env": "dev", "target_data_path": f"{PROJECT_PATH}/tests/data/tables"},
    )
    def test_excel_with_decorate(self, reader: DatasetReader, engine: SqlEngine):
        engine.verify_target_dataset()

    @spark_engine(
        spark=SPARK,
        sql_path=f"{PROJECT_PATH}/tests/data/cases/spark_etl_sql_test_excel_demo/spark_etl_demo.sql",
        reader=ExcelDatasetReader(
            f"{PROJECT_PATH}/tests/data/cases/spark_etl_sql_test_excel_demo/spark_etl_demo.xlsx"
        ),
        env={"env": "dev", "target_data_path": f"{PROJECT_PATH}/tests/data/tables"},
    )
    def test_excel_with_engine_decorate(self, engine: SqlEngine):
        engine.verify_target_dataset()
  1. Run you test cases.

Dataset Preparation

Currently, we also support two kinds of dataset reader, and we need to follow specific pattern to prepare source data and target data.

CSV Dataset Reader

  1. There will be a source and a target folder under specific dataset folder, click spark_etl_sql_test_csv_demo to check the example detail.
  2. Under source or target, you can create your source/target datasets defined in ETL SQL file, each dataset stands for a table, and we will used the csv file name as the table name, so please double check if the file name is match with table name in the SQL file.
  3. Read dataset, there are two kinds of use scenarios
  • Creating a reader object to read dataset by CsvDatasetReader(data_path="{dataset_folder}")
  • Using an annotation @csv_reader based on test function
@csv_reader(data_path="{dataset_folder}")
def test_case(reader: DatasetReader):
    pass

Excel Dataset Reader

  1. Different with CSV Dataset Reader, there is only one excel file which will include source datasets and target datasets.
  2. Within the Excel file, each sheet stands for a table, the sheet whose name starts with source-- stands for source dataset/table, target-- stands for target dataset/table. Different with CSV file, we are not use sheet name as the table name, because Excel has length limitation of sheet name, so we store the table name in the first row & first column, click spark_etl_demo.xlsx to get more detail.
  3. There also two kinds of use scenarios to read data
  • Creating a reader object to read dataset by ExcelDatasetReader(data_path="{excel_file_path}")
  • Using an annotation @csv_reader based on test function
@excel_reader(data_path="{excel_file_path}")
def test_case(reader: DatasetReader):
    pass

SQL Engine

Currently, we only support spark engine, we have plan to support other SQL engine, e.g. Flink.

Bugs/Requests

Please use the GitHub issue tracker <https://github.com/stayrascal/sqltest/issues>_ to submit bugs or request features.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqltest-0.0.10.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sqltest-0.0.10-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file sqltest-0.0.10.tar.gz.

File metadata

  • Download URL: sqltest-0.0.10.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.3

File hashes

Hashes for sqltest-0.0.10.tar.gz
Algorithm Hash digest
SHA256 41193a9ab15cb7ec78b28c1e93745d8dca7c71e9149dbbef69d8b09f49fef75b
MD5 aef28afe97e9ef49a74a6b6756e8081b
BLAKE2b-256 6b1079fc0d8ad4ea631f6b3eaae5d4f411fbb21a64f96995e6de22bc22adec37

See more details on using hashes here.

File details

Details for the file sqltest-0.0.10-py3-none-any.whl.

File metadata

  • Download URL: sqltest-0.0.10-py3-none-any.whl
  • Upload date:
  • Size: 13.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.3

File hashes

Hashes for sqltest-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 e76eb8f6269c0ee5f18652582906ef02278e3cdb9249e11ff16f665cdb5dd942
MD5 4f77aeb17c892fe3870610244955c8e1
BLAKE2b-256 0b1828bcf3e12fbd1d9728c2c3e5720749f4aef1be574e5c25476dbc8d06c653

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page