teradatamlspk

Python package for running Spark workloads on Teradata Vantage

These details have not been verified by PyPI

Project links

Homepage

Project description

Teradata Python package for running Spark workloads on Vantage.

teradatamlspk is a Python module to run PySpark workloads on Vantage with minimal changes to the Python script.

For community support, please visit the Teradata Community.

For Teradata customer support, please visit Teradata Support.

Release Notes
Installation and Requirements
Using the Teradata Python Package
Documentation
License

Release Notes:

teradatamlspk 20.0.0.0

teradatamlspk 20.0.0.0 is the initial release version. Please refer to the teradatamlspk User Guide for the available API's and their functionality.

Installation and Requirements

Package Requirements:

Python 3.5 or later

Note: 32-bit Python is not supported.

Minimum System Requirements:

Windows 7 (64Bit) or later
macOS 10.9 (64Bit) or later
Red Hat 7 or later versions
Ubuntu 16.04 or later versions
CentOS 7 or later versions
SLES 12 or later versions
Teradata Vantage Advanced SQL Engine:
- Advanced SQL Engine 16.20 Feature Update 1 or later

Installation

Use pip to install the teradatamlspk for running PySpark workloads.

Platform	Command
macOS/Linux	`pip install teradatamlspk`
Windows	`py -3 -m pip install teradatamlspk`

When upgrading to a new version, you may need to use pip install's --no-cache-dir option to force the download of the new version.

Platform	Command
macOS/Linux	`pip install --no-cache-dir -U teradatamlspk`
Windows	`py -3 -m pip install --no-cache-dir -U teradatamlspk`

Usage the `teradatamlspk` Package

teradatamlspk has a utility pyspark2teradataml which takes input as your PySpark script, analyzes it and generates 2 files as below:

HTML file - Created in the same directory where users PySpark script resides with name as <your pyspark script name>_tdmlspk.html. This file contains the script conversion report. Based on the report user can take the action on the generated scripts.
Python script - Created in the same directory where users PySpark script resides with name as <your pyspark script name>_tdmlspk.py. that can be run on Vantage.
- Refer to the HTML report to understand the changes done and required to be done in the script.

Example to demostrate the usage of utility `pyspark2teradataml`

>>> from teradatamlspk import pyspark2teradataml
>>> pyspark2teradataml('/tmp/pyspark_script.py')
Python script '/tmp/pyspark_script.py' converted to '/tmp/pyspark_script_tdmlspk.py' successfully.
Script conversion report '/tmp/pyspark_script_tdmlspk.html' published successfully.

Example to demostrate the `teradatamlspk` DataFrame creation.

>>> from teradatamlspk.sql import TeradataSession.
>>> spark = TeradataSession.builder.getOrCreate(host=host, user = user, password=password)
>>> df = spark.createDataFrame("test_classification")
>>> df.show()
+----------------------+---------------------+---------------------+----------------------+-------+
|         col1         |         col2        |         col3        |         col4         | label |
+----------------------+---------------------+---------------------+----------------------+-------+
| -1.1305820619922704  | -0.0202959251414216 | -0.7102336334648424 | -1.4409910829920618  |   0   |
| -0.28692000017174224 | -0.7169529842687833 | -0.9865850877151031 |  -0.848214734984639  |   0   |
| -2.5604297516143286  |  0.4022323367243113 | -1.1007419820939435 | -2.9595882598466674  |   0   |
|  0.4223414406917685  | -2.0391144030275625 |  -2.053215806414584 | -0.8491230457662061  |   0   |
|  0.7216694959200303  | -1.1215566442946217 | -0.8318398647044646 | 0.15074209659533433  |   0   |
| -0.9861325665504175  |  1.7105310292848412 |  1.3382818041204743 | -0.08534109029742933 |   1   |
| -0.5097927128625588  |  0.4926589443964751 |  0.2482067293662461 | -0.3095907315896897  |   1   |
| 0.18332468205821462  |  -0.774610353732039 |  -0.766054694735782 | -0.29366863291253276 |   0   |
| -0.4032571038523639  |  2.0061840569850093 |  2.0275124771199318 |  0.8508919440196763  |   1   |
| -0.07156025619387396 |  0.2295539000122874 | 0.21654344712218576 | 0.06527397921673575  |   1   |
+----------------------+---------------------+---------------------+----------------------+-------+

Documentation

General product information, including installation instructions, is available in the Teradata Documentation website

License

Use of the Teradata Spark Package is governed by the License Agreement for teradatamlspk and pyspark2teradataml. After installation, the LICENSE and LICENSE-3RD-PARTY files are located in the teradatamlspk directory of the Python installation directory.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

20.0.0.4

Dec 16, 2025

20.0.0.3

Jul 2, 2025

20.0.0.2

Nov 5, 2024

20.0.0.1

May 22, 2024

This version

20.0.0.0

Mar 28, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

teradatamlspk-20.0.0.0-py3-none-any.whl (191.9 kB view details)

Uploaded Mar 28, 2024 Python 3

File details

Details for the file teradatamlspk-20.0.0.0-py3-none-any.whl.

File metadata

Download URL: teradatamlspk-20.0.0.0-py3-none-any.whl
Upload date: Mar 28, 2024
Size: 191.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for teradatamlspk-20.0.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9075cb3a02e00685add98db35a66060b439c7f28b32de090f6dd8fa6b3091bb5`
MD5	`34800e117c1e84ebd6a5274378626feb`
BLAKE2b-256	`c7a184a0dab909f0d2637468924512565d27ca44cfe2a0f6399b2df663e8cf45`

See more details on using hashes here.

teradatamlspk 20.0.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Teradata Python package for running Spark workloads on Vantage.

Table of Contents

Release Notes:

teradatamlspk 20.0.0.0

Installation and Requirements

Package Requirements:

Minimum System Requirements:

Installation

Usage the `teradatamlspk` Package

Example to demostrate the usage of utility `pyspark2teradataml`

Example to demostrate the `teradatamlspk` DataFrame creation.

Documentation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

teradatamlspk 20.0.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Teradata Python package for running Spark workloads on Vantage.

Table of Contents

Release Notes:

teradatamlspk 20.0.0.0

Installation and Requirements

Package Requirements:

Minimum System Requirements:

Installation

Usage the teradatamlspk Package

Example to demostrate the usage of utility pyspark2teradataml

Example to demostrate the teradatamlspk DataFrame creation.

Documentation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

Usage the `teradatamlspk` Package

Example to demostrate the usage of utility `pyspark2teradataml`

Example to demostrate the `teradatamlspk` DataFrame creation.