Skip to main content

Teradata Vantage Python package for Advanced Analytics

Project description

Teradata Python package for Advanced Analytics.

teradataml makes available to Python users a collection of analytic functions that reside on Teradata Vantage. This allows users to perform analytics on Teradata Vantage with no SQL coding. In addition, the teradataml library provides functions for scaling data manipulation and transformation, data filtering and sub-setting, and can be used in conjunction with other open-source python libraries.

For community support, please visit the Connectivity Forum.

For Teradata customer support, please visit Teradata Access.

Copyright 2019, Teradata. All Rights Reserved.

Table of Contents

Release Notes:

teradataml 17.10.00.00

  • New Features/Functionality
    • Geospatial

      The Geospatial feature in teradataml enables data manipulation, exploration and analysis on tables, views, and queries on Teradata Vantage that contains Geospatial data.

      • Geomtery Types
        • Point
        • LineString
        • Polygon
        • MultiPoint
        • MultiLineString
        • MultiPolygon
        • GeometryCollection
        • GeoSequence
      • teradataml GeoDataFrame
        • Properties
          • columns
          • dtypes
          • geometry
          • iloc
          • index
          • loc
          • shape
          • size
          • tdtypes
          • Geospatial Specific Properties
            • Properties for all Types of Geometries
              • boundry
              • centroid
              • convex_hell
              • coord_dim
              • dimension
              • geom_type
              • is_3D
              • is_empty
              • is_simple
              • is_valid
              • max_x
              • max_y
              • max_z
              • min_x
              • min_y
              • min_z
              • srid
            • Properties for Point Geometry
              • x
              • y
              • z
            • Properties for LineString Geometry
              • is_closed_3D
              • is_closed
              • is_ring
            • Properties for Polygon Geometry
              • area
              • exterior
              • perimeter
        • Methods
          • __getattr__()
          • __getitem__()
          • __init__()
          • __repr__()
          • assign()
          • concat()
          • count()
          • drop()
          • dropna()
          • filter()
          • from_query()
          • from_table()
          • get()
          • get_values()
          • groupby()
          • head()
          • info()
          • join()
          • keys()
          • merge()
          • sample()
          • select()
          • set_index()
          • show_query()
          • sort()
          • sort_index()
          • squeeze()
          • tail()
          • to_csv()
          • to_pandas()
          • to_sql()
          • Geospatial Specific Methods
            • Methods for All Type of Geometry
              • buffer()
              • contains()
              • crosses()
              • difference()
              • disjoint()
              • distance()
              • distance_3D()
              • envelope()
              • geom_equals()
              • intersection()
              • intersects()
              • make_2D()
              • mbb()
              • mbr()
              • overlaps()
              • relates()
              • set_exterior()
              • set_srid()
              • simplify()
              • sym_difference()
              • to_binary()
              • to_text()
              • touches()
              • transform()
              • union()
              • within()
              • wkb_geom_to_sql()
              • wkt_geom_to_sql()
            • Methods for Point Geometry
              • spherical_buffer()
              • spherical_distance()
              • spheriodal_buffer()
              • spheriodal_distance()
              • set_x()
              • set_y()
              • set_z()
            • Methods for LineString Geometry
              • end_point()
              • length()
              • length_3D()
              • line_interpolate_point()
              • num_points()
              • point()
              • start_point()
            • Methods for Polygon Geometry
              • interiors()
              • num_interior_ring()
              • point_on_surface()
            • Methods for GeometryCollection Geometry
              • geom_component()
              • num_geometry()
            • Methods for GeoSequence Geometry
              • clip()
              • get_final_timestamp()
              • get_init_timestamp()
              • get_link()
              • get_user_field()
              • get_user_field_count()
              • point_heading()
              • set_link()
              • speed()
            • Filtering Functions and Methods
              • intersects_mbb()
              • mbb_filter()
              • mbr_filter()
              • within_mbb()
      • teradataml GeoDataFrameColumn
        • Geospatial Specific Properties
          • Properties for all Types of Geometries
            • boundry
            • centroid
            • convex_hell
            • coord_dim
            • dimension
            • geom_type
            • is_3D
            • is_empty
            • is_simple
            • is_valid
            • max_x
            • max_y
            • max_z
            • min_x
            • min_y
            • min_z
            • srid
          • Properties for Point Geometry
            • x
            • y
            • z
          • Properties for LineString Geometry
            • is_closed_3D
            • is_closed
            • is_ring
          • Properties for Polygon Geometry
            • area
            • exterior
            • perimeter
        • Geospatial Specific Methods
          • Methods for All Type of Geometry
            • buffer()
            • contain()
            • crosses()
            • difference()
            • disjoint()
            • distance()
            • distance_3D()
            • envelope()
            • geom_equals()
            • intersection()
            • intersects()
            • make_2D()
            • mbb()
            • mbr()
            • overlaps()
            • relates()
            • set_exterior()
            • set_srid()
            • simplify()
            • sym_difference()
            • to_binary()
            • to_text()
            • touches()
            • transform()
            • union()
            • within()
            • wkb_geom_to_sql()
            • wkt_geom_tp_sql()
          • Methods for Point Geometry
            • spherical_buffer()
            • spherical_distance()
            • spheriodal_buffer()
            • spheriodal_distance()
            • set_x()
            • set_y()
            • set_z()
          • Methods for LineString Geometry
            • endpoint()
            • length()
            • length_3D()
            • line_interpolate_point()
            • num_points()
            • point()
            • start_point()
          • Methods for Polygon Geometry
            • interiors()
            • num_interior_ring()
            • point_on_surface()
          • Methods for GeometryCollection Geometry
            • geom_component()
            • num_geometry()
          • Methods for GeoSequence Geometry
            • clip()
            • get_final_timestamp()
            • get_init_temestamp()
            • get_link()
            • get_user_field()
            • get_user_field_count()
            • point_heading()
            • set_link()
            • speed()
          • Filtering Functions and Methods
            • intersect_mbb()
            • mbb_filter()
            • mbr_filter()
            • within_mbb()
    • teradataml DataFrame
      • New Functions
        • to_csv()
    • teradataml: SQLE Engine Analytic Functions
      • New Functions
        • Functions Supported on DatabaseVersions: 16.20.x.x, 17.10.x.x, 17.05.x.x
          • Antiselect()
          • Attribution()
          • DecisionForestPredict()
          • DecisionTreePredict()
          • GLMPredict()
          • MovingAverage()
          • NaiveBayesPredict()
          • NaiveBayesTextClassifierPredict()
          • NGramSplitter()
          • NPath()
          • Pack()
          • Sessionize()
          • StringSimilarity()
          • SVMParsePredict()
          • Unpack()
        • Functions Supported on DatabaseVersions: 17.10.x.x
          • Antiselect()
          • Attribution()
          • BincoodeFit()
          • BncodeTransform()
          • CategoricalSummary()
          • ChiSq()
          • ColumnSummary()
          • ConvertTo()
          • DecisionForestPredict()
          • DecisionTreePredict()
          • GLMPredict()
          • FillRowId()
          • FTest()
          • Fit()
          • Transform()
          • GetRowsWithMissingValues()
          • GetRowsWithoutMissingValues()
          • MovingAverage()
          • Histogram()
          • NaiveBayesPredict()
          • NaiveBayesTextClassifierPredict()
          • NGramSplitter()
          • NPath()
          • NumApply()
          • OneHotEncodingFit()
          • OneHotEncodingTransform()
          • OutlierFilterFit()
          • OutlierFilterTransform()
          • Pack()
          • PolynomialFeatuesFit()
          • PolynomialFeatuesTransform()
          • QQNorm()
          • RoundColumns()
          • RowNormalizeFit()
          • RowNormalizeTransform()
          • ScaleFit()
          • ScaleTransform()
          • Sessionize()
          • SimpleImputeFit()
          • SimpleImputeTransform()
          • StrApply()
          • StringSimilarity()
          • SVMParsePredict()
          • UniVariateStatistics()
          • Unpack()
          • WhichMax()
          • WhichMin()
          • ZTest()
    • teradataml: General Functions
      • New Functions
        • Data Transfer Utility
          • read_csv()
    • Operators
      • New Functions
        • Table Operators
          • read_nos()
          • write_nos()
    • teradataml: Bring Your Own Model
      • New Functions
        • Model Cataloging
          • get_license()
          • set_byom_catalog()
          • set_license()
  • Updates
    • teradataml: General Functions
      • Data Transfer Utility
        • copy_to_sql() - New argument "chunksize" added to load data in chunks.
        • Following Data Transfer Utility Functions updated to specify the number of Teradata sessions to open for data transfer using "open_session" argument:
          • fastexport()
          • fastload()
          • to_pandas()
    • Operators
      • Following Set Operator Functions updated to work with Geospatial data:
        • concat()
        • td_intersect()
        • td_expect()
        • td_minus()
    • teradataml: Bring Your Own Model
      • Model cataloging APIs mentioned below are updated to use session level parameters set by set_byom_catalog() and set_license() such as table name, schema name and license details respectively.
        • delete_byom()
        • list_byom()
        • retrieve_byom()
        • save_byom()
      • view_log() - Allows user to view BYOM logs.
  • Bug Fixes
    • CS0733758 - db_python_package_details() function is fixed to support latest STO release for pip and Python aliases used.
    • DataFrame print() issue related to Response Row size is greater than the 1MB allowed maximum. has been fixed to print the data with lot of columns.
    • New parameter "chunksize" is added to DataFrame.to_sql() and copy_to_sql() to fix the issue where the function was failing with error - "Request requires too many SPOOL files.". Reducing the chunksize than the default one will result in successful operation.
    • remove_context() is fixed to remove the active connection from database.
    • Support added to specify the number of Teradata data transfer sessions to open for data transfer using fastexport() and fastload() functions.
    • DataFrame.to_sql() is fixed to support temporary table when default database differs from the username.
    • DataFrame.to_pandas() now by default support data transfer using regular method. Change is carried out for user to allow the data transfer if utility throttles are configured, i.e., TASM configuration does not support data export using FastExport.
    • save_byom() now notifies if VARCHAR column is trimmed out if data passed to the API is greater than the length of the VARCHAR column.
    • Standard error can now be captured for DataFrame.map_row() and DataFrame.map_parition() when executed in LOCAL mode.
    • Vantage Analytic Library - Underlying SQL can be retrieved using newly added arguments "gen_sql"/"gen_sql_only" for the functions. Query can be viewed with the help show_query().
    • Documentation example has been fixed for fastexport() to show the correct import statement.

teradataml 17.00.00.05

Fixed [CS0733758] db_python_package_details() fails on recent STO release due to changes in pip and python aliases.

teradataml 17.00.00.04

  • New Features/Functionality
    • Analytic Functions
      • Bring Your Own Analytics Functions The BYOM feature in Vantage provides flexibility to score the data in Vantage using external models using following BYOM functions:
        • H2OPredict() - Score using model trained externally in H2O and stored in Vantage.
        • PMMLPredict() - Score using model trained externally in PMML and stored in Vantage.
        • BYOM Model Catalog APIs
          • save_byom() - Save externally trained models in Teradata Vantage.
          • delete_byom() - Delete a model from the user specified table in Teradata Vantage.
          • list_byom() - List models.
          • retrieve_byom() - Function to retrieve a saved model.
      • Vantage Analytic Library Functions
        • New Functions
          • XmlToHtmlReport() - Transforms XML output of VAL functions to HTML.
    • teradataml DataFrame
      • DataFrame.window() - Generates Window object on a teradataml DataFrame to run window aggregate functions.
      • DataFrame.csum() - Returns column-wise cumulative sum for rows in the partition of the dataframe.
      • DataFrame.mavg() - Returns moving average for the current row and the preceding rows.
      • DataFrame.mdiff() - Returns moving difference for the current row and the preceding rows.
      • DataFrame.mlinreg() - Returns moving linear regression for the current row and the preceding rows.
      • DataFrame.msum() - Returns moving sum for the current row and the preceding rows.
      • Regular Aggregate Functions
        • DataFrame.corr() - Returns the Sample Pearson product moment correlation coefficient.
        • DataFrame.covar_pop() - Returns the population covariance.
        • DataFrame.covar_samp() - Returns the sample covariance.
        • DataFrame.regr_avgx() - Returns the mean of the independent variable.
        • DataFrame.regr_avgy() - Returns the mean of the dependent variable.
        • DataFrame.regr_count() - Returns the count of the dependent and independent variable arguments.
        • DataFrame.rege_intercept() - Returns the intercept of the univariate linear regression line.
        • DataFrame.regr_r2() - Returns the coefficient of determination.
        • DataFrame.regr_slope() - Returns the slope of the univariate linear regression line through.
        • DataFrame.regr_sxx() - Returns the sum of the squares of the independent variable expression.
        • DataFrame.regr_sxy() - Returns the sum of the products of the independent variable and the dependent variable.
        • DataFrame.regr_syy() - Returns the sum of the squares of the dependent variable expression.
    • teradataml DataFrameColumn a.k.a. ColumnExpression
      • ColumnExpression.window() - Generates Window object on a teradataml DataFrameColumn to run window aggregate functions.
      • ColumnExpression.desc() - Sorts ColumnExpression in descending order.
      • ColumnExpression.asc() - Sorts ColumnExpression in ascending order.
      • ColumnExpression.distinct() - Removes duplicate value from ColumnExpression.
      • Regular Aggregate Functions
        • ColumnExpression.corr() - Returns the Sample Pearson product moment correlation coefficient.
        • ColumnExpression.count() - Returns the column-wise count.
        • ColumnExpression.covar_pop() - Returns the population covariance.
        • ColumnExpression.covar_samp() - Returns the sample covariance.
        • ColumnExpression.kurtosis() - Returns kurtosis value for a column.
        • ColumnExpression.median() - Returns column-wise median value.
        • ColumnExpression.max() - Returns the column-wise max value.
        • ColumnExpression.mean() - Returns the column-wise average value.
        • ColumnExpression.min() - Returns the column-wise min value.
        • ColumnExpression.regr_avgx() - Returns the mean of the independent variable.
        • ColumnExpression.regr_avgy() - Returns the mean of the dependent variable.
        • ColumnExpression.regr_count() - Returns the count of the dependent and independent variable arguments.
        • ColumnExpression.rege_intercept() - Returns the intercept of the univariate linear regression line.
        • ColumnExpression.regr_r2() - Returns the coefficient of determination arguments.
        • ColumnExpression.regr_slope() - Returns the slope of the univariate linear regression line.
        • ColumnExpression.regr_sxx() - Returns the sum of the squares of the independent variable expression.
        • ColumnExpression.regr_sxy() - Returns the sum of the products of the independent variable and the dependent variable.
        • ColumnExpression.regr_syy() - Returns the sum of the squares of the dependent variable expression.
        • ColumnExpression.skew() - Returns skew value for a column.
        • ColumnExpression.std() - Returns the column-wise population/sample standard deviation.
        • ColumnExpression.sum() - Returns the column-wise sum.
        • ColumnExpression.var() - Returns the column-wise population/sample variance.
        • ColumnExpression.percentile() - Returns the column-wise percentile.
    • teradataml Window - Window Aggregate Functions
      Following set of Window Aggregate Functions return the results over a specified window which can be of any type:
      • Cumulative/Expanding window
      • Moving/Rolling window
      • Contracting/Remaining window
      • Grouping window Window Aggregate Functions
      • Window.corr() - Returns the Sample Pearson product moment correlation coefficient.
      • Window.count() - Returns the count.
      • Window.covar_pop() - Returns the population covariance.
      • Window.covar_samp() - Returns the sample covariance.
      • Window.cume_dist() - Returns the cumulative distribution of values.
      • Window.dense_Rank() - Returns the ordered ranking of all the rows.
      • Window.first_value() - Returns the first value of an ordered set of values.
      • Window.lag() - Returns data from the row preceding the current row at a specified offset value.
      • Window.last_value() - Returns the last value of an ordered set of values.
      • Window.lead() - Returns data from the row following the current row at a specified offset value.
      • Window.max() - Returns the column-wise max value.
      • Window.mean() - Returns the column-wise average value.
      • Window.min() - Returns the column-wise min value.
      • Window.percent_rank() - Returns the relative rank of all the rows.
      • Window.rank() - Returns the rank (1 … n) of all the rows.
      • Window.regr_avgx() - Returns the mean of the independent variable arguments.
      • Window.regr_avgy() - Returns the mean of the dependent variable arguments.
      • Window.regr_count() - Returns the count of the dependent and independent variable arguments.
      • Window.rege_intercept() - Returns the intercept of the univariate linear regression line arguments.
      • Window.regr_r2() - Returns the coefficient of determination arguments.
      • Window.regr_slope() - Returns the slope of the univariate linear regression line.
      • Window.regr_sxx() - Returns the sum of the squares of the independent variable expression.
      • Window.regr_sxy() - Returns the sum of the products of the independent variable and the dependent variable.
      • Window.regr_syy() - Returns the sum of the squares of the dependent variable expression.
      • Window.row_number() - Returns the sequential row number.
      • Window.std() - Returns the column-wise population/sample standard deviation.
      • Window.sum() - Returns the column-wise sum.
      • Window.var() - Returns the column-wise population/sample variance.
    • General functions
      • New functions
        • fastexport() - Exports teradataml DataFrame to Pandas DataFrame using FastExport data transfer protocol.
    • teradataml Options
      • Display Options
        • display.blob_length Specifies default display length of BLOB column in teradataml DataFrame.
      • Configuration Options
        • configure.temp_table_database Specifies database name for storing the tables created internally.
        • configure.temp_view_database Specifies database name for storing the views created internally.
        • configure.byom_install_location Specifies the install location for the BYOM functions.
        • configure.val_install_location Specifies the install location for the Vantage Analytic Library functions.
  • Updates
    • teradataml DataFrame
      • to_pandas() -
        • Support added to transfer data to Pandas DataFrame using fastexport protocol improving the performance.
        • Support added for other arguments similar to Pandas read_sql():
          • coerce_float
          • parse_dates
    • Analytic functions
      • Vantage Analytic Library Functions
        • Support added to accept datetime.date object for literals/values in following transformation functions:
          • FillNa()
          • Binning()
          • OneHotEncoder()
          • LabelEncoder()
        • All transformation functions now supports accepting teradatasqlalchemy datatypes as input to "datatype" argument for casting the result.
  • Bug Fixes.
    • CS0249633 - Support added for teradataml to work with user/database/tablename containing period (.).
    • CS0086594 - Use of dbc.tablesvx versus dbc.tablesvx in teradatasqlalchemy.
    • IPython integration to print the teradataml DataFrames in pretty format.
    • teradataml DataFrame APIs now support column names same as that of Teradata reserved keywords.
    • Issue has been fixed for duplicate rows being loaded via teradataml fastload() API.
    • VAL - Empty string now can be passed as input for recoding values using LabelEncoder.
    • teradataml extension with SQLAlchemy functions:
      • mod() function is fixed to return correct datatype.
      • sum() function is fixed to return correct datatype.

teradataml 17.00.00.03

  • New release of SQLAlchemy1.4.x introduced backward compatibility issue. A fix has been carried out so that teradataml can support latest SQLAlchemy changes.
  • Other minor bug fixes.

teradataml 17.00.00.02

Fixed the internal library load issue related to the GCC version discrepancies on CentOS platform.

teradataml 17.00.00.01

  • New Features/Functionality
    • Analytic Functions
      • Vantage Analytic Library teradataml now supports executing analytic functions offered by Vantage Analytic Library. These functions are available via new 'valib' sub-package of teradataml. Following functions are added as part of this:
        • Association Rules:
          • Association()
        • Descriptive Statistics:
          • AdaptiveHistogram()
          • Explore()
          • Frequency()
          • Histogram()
          • Overlaps()
          • Statistics()
          • TextAnalyzer()
          • Values()
        • Decision Tree:
          • DecisionTree()
          • DecisionTreePredict()
          • DecisionTreeEvaluator()
        • Fast K-Means Clustering:
          • KMeans()
          • KMeansPredict()
        • Linear Regression:
          • LinReg()
          • LinRegPredict()
        • Logistic Regression:
          • LogReg()
          • LogRegPredict()
          • LogRegEvaluator()
        • Factor Analysis:
          • PCA()
          • PCAPredict()
          • PCAEvaluator()
        • Matrix Building:
          • Matrix()
        • Statistical Tests:
          • BinomialTest()
          • ChiSquareTest()
          • KSTest()
          • ParametricTest()
          • RankTest()
        • Variable Transformation:
          • Transform()
          • Transformation Techniques supported for variable transformation:
            • Binning() - Perform bin coding to replaces continuous numeric column with a categorical one to produce ordinal values.
            • Derive() - Perform free-form transformation done using arithmetic formula.
            • FillNa() - Perform missing value/null replacement transformations.
            • LabelEncoder() - Re-express categorical column values into a new coding scheme.
            • MinMaxScalar() - Rescale data limiting the upper and lower boundaries.
            • OneHotEncoder() - Re-express a categorical data element as one or more numeric data elements, creating a binary numeric field for each categorical data value.
            • Retain() - Copy one or more columns into the final analytic data set.
            • Sigmoid() - Rescale data using sigmoid or s-shaped functions.
            • ZScore() - Rescale data using Z-Score values.
      • ML Engine Functions (mle)
        • Correlation2
        • NaiveBayesTextClassifier2
    • DataFrame
      • New Functions
        • DataFrame.map_row() - Function to apply a user defined function to each row in the teradataml DataFrame.
        • DataFrame.map_partition() - Function to apply a user defined function to a group or partition of rows in the teradataml DataFrame.
      • New Property
        • DataFrame.tdtypes - Get the teradataml DataFrame metadata containing column names and corresponding teradatasqlalchemy types.
    • General functions
      • New functions
        • Database Utility Functions
          • db_python_package_details() - Lists the details of Python packages installed on Vantage.
        • General Utility Functions
          • print_options()
          • view_log()
          • setup_sandbox_env()
          • copy_files_from_container()
          • cleanup_sandbox_env()
  • Updates
    • create_context()
      • Supports all connection parameters supported by teradatasql.connect().
    • Script
      • test_script() can now be executed in 'local' mode, i.e., outside of the sandbox.
      • Script.setup_sto_env() is deprecated. Use setup_sandbox_env() function instead.
      • Added support for using "quotechar" argument.
    • Analytic functions
      • Updates
        • Visit teradataml User Guide to know more about the updates done to ML Engine analytic functions. Following type of updates are done to several functions:
          • New arguments are added, which are supported only on Vantage Version 1.3.
          • Default value has been updated for few function arguments.
          • Few arguments were required, but now they are optional.
  • Minor Bug Fixes.

teradataml 17.00.00.00

  • New Features/Functionality
    • Model Cataloging - Functionality to catalog model metadata and related information in the Model Catalog.
      • save_model() - Save a teradataml Analytic Function model.
      • retrieve_model() - Retrieve a saved model.
      • list_model() - List accessible models.
      • describe_model() - List the details of a model.
      • delete_model() - Remove a model from Model Catalog.
      • publish_model() - Share a model.
    • Script - An interface to the SCRIPT table operator object in the Advanced SQL Engine.
      Interface offers execution in two modes:
      • Test/Debug - to test user scripts locally in a containerized environment. Supporting methods:
        • setup_sto_env() - Set up test environment.
        • test_script() - Test user script in containerized environment.
        • set_data() - Set test data parameters.
      • In-Database Script Execution - to execute user scripts in database. Supporting methods:
        • execute_script() - Execute user script in Vantage.
        • install_file() - Install or replace file in Database.
        • remove_file() - Remove installed file from Database.
        • set_data() - Set test data parameters.
    • DataFrame
      • DataFrame.show_query() - Show underlying query for DataFrame.
      • Regular Aggregates
        • New functions
          • kurtosis() - Calculate the kurtosis value.
          • skew() - Calculate the skewness of the distribution.
        • Updates
          New argument distinct is added to following aggregates to exclude duplicate values.
          • count()
          • max()
          • mean()
          • min()
          • sum()
          • std()
            • New argument population is added to calculate the population standard deviation.
          • var()
            • New argument population is added to calculate the population variance.
      • Time Series Aggregates
        • New functions
          • kurtosis() - Calculate the kurtosis value.
          • count() - Get the total number of values.
          • max() - Calculate the maximum value.
          • mean() - Calculate the average value.
          • min() - Calculate the minimum value.
          • percentile() - Calculate the desired percentile.
          • skew() - Calculate the skewness of the distribution.
          • sum() - Calculate the column-wise sum value.
          • std() - Calculate the sample and population standard deviation.
          • var() - Calculate the sample and population standard variance.
    • General functions
      • New functions
        • Database Utility Functions
          • db_drop_table()
          • db_drop_view()
          • db_list_tables()
        • Vantage File Management Functions
          • install_file() - Install a file in Database.
          • remove_file() - Remove an installed file from Database.
      • Updates
        • create_context()
          • Support added for Stored Password Protection feature.
          • Kerberos authentication bug fix.
          • New argument database added to create_context() API, that allows user to specify connecting database.
    • Analytic functions
      • New functions
        • Betweenness
        • Closeness
        • FMeasure
        • FrequentPaths
        • IdentityMatch
        • Interpolator
        • ROC
      • Updates
        • New methods are added to all analytic functions
          • show_query()
          • get_build_time()
          • get_prediction_type()
          • get_target_column()
        • New properties are added to analytic function's Formula argument
          • response_column
          • numeric_columns
          • categorical_columns
          • all_columns

teradataml 16.20.00.06

Fixed the DataFrame data display corruption issue observed with certain analytic functions.

teradataml 16.20.00.05

Compatible with Vantage 1.1.1.
The following ML Engine (teradataml.analytics.mle) functions have new and/or updated arguments to support the Vantage version:

  • AdaBoostPredict
  • DecisionForestPredict
  • DecisionTreePredict
  • GLMPredict
  • LDA
  • NaiveBayesPredict
  • NaiveBayesTextClassifierPredict
  • SVMDensePredict
  • SVMSparse
  • SVMSparsePredict
  • XGBoostPredict

teradataml 16.20.00.04

  • Improvements
    • DataFrame creation is now quicker, impacting many APIs and Analytic functions.
    • Improved performance by reducing the number of intermediate queries issued to Teradata Vantage when not required.
      • The number of queries reduced by combining multiple operations into a single step whenever possible and unless the user expects or demands to see the intermediate results.
      • The performance improvement is almost proportional to the number of chained and unexecuted operations on a teradataml DataFrame.
    • Reduced number of intermediate internal objects created on Vantage.
  • New Features/Functionality
    • General functions
      • New functions
        • show_versions() - to list the version of teradataml and dependencies installed.
        • fastload() - for high performance data loading of large amounts of data into a table on Vantage. Requires teradatasql version 16.20.0.48 or above.
        • Set operators:
          • concat
          • td_intersect
          • td_except
          • td_minus
        • case() - to help construct SQL CASE based expressions.
      • Updates
        • copy_to_sql
          • Added support to copy_to_sql to save multi-level index.
          • Corrected the type mapping for index when being saved.
        • create_context() updated to support 'JWT' logon mechanism.
    • Analytic functions
      • New functions
        • NERTrainer
        • NERExtractor
        • NEREvaluator
        • GLML1L2
        • GLML1L2Predict
      • Updates
        • Added support to categorize numeric columns as categorical while using formula - as_categorical() in the teradataml.common.formula module.
    • DataFrame
      • Added support to create DataFrame from Volatile and Primary Time Index tables.
      • DataFrame.sample() - to sample data.
      • DataFrame.index - Property to access index_label of DataFrame.
      • Functionality to process Time Series Data
        • Grouping/Resampling time series data:
          • groupby_time()
          • resample()
        • Time Series Aggregates:
          • bottom()
          • count()
          • describe()
          • delta_t()
          • mad()
          • median()
          • mode()
          • first()
          • last()
          • top()
      • DataFrame API and method argument validation added.
      • DataFrame.info() - Default value for null_counts argument updated from None to False.
      • Dataframe.merge() updated to accept columns expressions along with column names to on, left_on, right_on arguments.
    • DataFrame Column/ColumnExpression methods
      • cast() - to help cast the column to a specified type.
      • isin() and ~isin() - to check the presence of values in a column.
  • Removed deprecated Analytic functions
    • All the deprecated Analytic functions under the teradataml.analytics module have been removed. Newer versions of the functions are available under the teradataml.analytics.mle and the teradataml.analytics.sqle modules. The modules removed are:
      • teradataml.analytics.Antiselect
      • teradataml.analytics.Arima
      • teradataml.analytics.ArimaPredictor
      • teradataml.analytics.Attribution
      • teradataml.analytics.ConfusionMatrix
      • teradataml.analytics.CoxHazardRatio
      • teradataml.analytics.CoxPH
      • teradataml.analytics.CoxSurvival
      • teradataml.analytics.DecisionForest
      • teradataml.analytics.DecisionForestEvaluator
      • teradataml.analytics.DecisionForestPredict
      • teradataml.analytics.DecisionTree
      • teradataml.analytics.DecisionTreePredict
      • teradataml.analytics.GLM
      • teradataml.analytics.GLMPredict
      • teradataml.analytics.KMeans
      • teradataml.analytics.NGrams
      • teradataml.analytics.NPath
      • teradataml.analytics.NaiveBayes
      • teradataml.analytics.NaiveBayesPredict
      • teradataml.analytics.NaiveBayesTextClassifier
      • teradataml.analytics.NaiveBayesTextClassifierPredict
      • teradataml.analytics.Pack
      • teradataml.analytics.SVMSparse
      • teradataml.analytics.SVMSparsePredict
      • teradataml.analytics.SentenceExtractor
      • teradataml.analytics.Sessionize
      • teradataml.analytics.TF
      • teradataml.analytics.TFIDF
      • teradataml.analytics.TextTagger
      • teradataml.analytics.TextTokenizer
      • teradataml.analytics.Unpack
      • teradataml.analytics.VarMax

teradataml 16.20.00.03

  • Fixed the garbage collection issue observed with remove_context() when context is created using a SQLAlchemy engine.
  • Added 4 new Advanced SQL Engine (was NewSQL Engine) analytic functions supported only on Vantage 1.1:
    • Antiselect, Pack, StringSimilarity, and Unpack.
  • Updated the Machine Learning Engine NGrams function to work with Vantage 1.1.

teradataml 16.20.00.02

  • Python version 3.4.x will no longer be supported. The Python versions supported are 3.5.x, 3.6.x, and 3.7.x.
  • Major issue with the usage of formula argument in analytic functions with Python3.7 has been fixed, allowing this package to be used with Python3.7 or later.
  • Configurable alias name support for analytic functions has been added.
  • Support added to create_context (connect to Teradata Vantage) with different logon mechanisms. Logon mechanisms supported are: 'TD2', 'TDNEGO', 'LDAP' & 'KRB5'.
  • copy_to_sql function and DataFrame 'to_sql' methods now provide following additional functionality:
    • Create Primary Time Index tables.
    • Create set/multiset tables.
  • New DataFrame methods are added: 'median', 'var', 'squeeze', 'sort_index', 'concat'.
  • DataFrame method 'join' is now updated to make use of ColumnExpressions (df.column_name) for the 'on' clause as opposed to strings.
  • Series is supported as a first class object by calling squeeze on DataFrame.
    • Methods supported by teradataml Series are: 'head', 'unique', 'name', '__repr__'.
    • Binary operations with teradataml Series is not yet supported. Try using Columns from teradataml.DataFrames.
  • Sample datasets and commands to load the same have been provided in the function examples.
  • New configuration property has been added 'column_casesenitive_handler'. Useful when one needs to play with case sensitive columns.

teradataml 16.20.00.01

  • New support has been added for Linux distributions: Red Hat 7+, Ubuntu 16.04+, CentOS 7+, SLES12+.
  • 16.20.00.01 now has over 100 analytic functions. These functions have been organized into their own packages for better control over which engine to execute the analytic function on. Due to these namespace changes, the old analytic functions have been deprecated and will be removed in a future release. See the Deprecations section in the Teradata Python Package User Guide for more information.
  • New DataFrame methods shape, iloc, describe, get_values, merge, and tail.
  • New Series methods for NA checking (isnull, notnull) and string processing (lower, strip, contains).

teradataml 16.20.00.00

  • teradataml 16.20.00.00 is the first release version. Please refer to the Teradata Python Package User Guide for a list of Limitations and Usage Considerations.

Installation and Requirements

Package Requirements:

  • Python 3.5 or later

Note: 32-bit Python is not supported.

Minimum System Requirements:

  • Windows 7 (64Bit) or later
  • macOS 10.9 (64Bit) or later
  • Red Hat 7 or later versions
  • Ubuntu 16.04 or later versions
  • CentOS 7 or later versions
  • SLES 12 or later versions
  • Teradata Vantage Advanced SQL Engine:
    • Advanced SQL Engine 16.20 Feature Update 1 or later
  • For a Teradata Vantage system with the ML Engine:
    • Teradata Machine Learning Engine 08.00.03.01 or later

Installation

Use pip to install the Teradata Python Package for Advanced Analytics.

Platform Command
macOS/Linux pip install teradataml
Windows py -3 -m pip install teradataml

When upgrading to a new version of the Teradata Python Package, you may need to use pip install's --no-cache-dir option to force the download of the new version.

Platform Command
macOS/Linux pip install --no-cache-dir -U teradataml
Windows py -3 -m pip install --no-cache-dir -U teradataml

Using the Teradata Python Package

Your Python script must import the teradataml package in order to use the Teradata Python Package:

>>> import teradataml as tdml
>>> from teradataml import create_context, remove_context
>>> create_context(host = 'hostname', username = 'user', password = 'password')
>>> df = tdml.DataFrame('iris')
>>> df

   SepalLength  SepalWidth  PetalLength  PetalWidth             Name
0          5.1         3.8          1.5         0.3      Iris-setosa
1          6.9         3.1          5.1         2.3   Iris-virginica
2          5.1         3.5          1.4         0.3      Iris-setosa
3          5.9         3.0          4.2         1.5  Iris-versicolor
4          6.0         2.9          4.5         1.5  Iris-versicolor
5          5.0         3.5          1.3         0.3      Iris-setosa
6          5.5         2.4          3.8         1.1  Iris-versicolor
7          6.9         3.2          5.7         2.3   Iris-virginica
8          4.4         3.0          1.3         0.2      Iris-setosa
9          5.8         2.7          5.1         1.9   Iris-virginica

>>> df = df.select(['Name', 'SepalLength', 'PetalLength'])
>>> df

              Name  SepalLength  PetalLength
0  Iris-versicolor          6.0          4.5
1  Iris-versicolor          5.5          3.8
2   Iris-virginica          6.9          5.7
3      Iris-setosa          5.1          1.4
4      Iris-setosa          5.1          1.5
5   Iris-virginica          5.8          5.1
6   Iris-virginica          6.9          5.1
7      Iris-setosa          5.1          1.4
8   Iris-virginica          7.7          6.7
9      Iris-setosa          5.0          1.3

>>> df = df[(df.Name == 'Iris-setosa') & (df.PetalLength > 1.5)]
>>> df

          Name  SepalLength  PetalLength
0  Iris-setosa          4.8          1.9
1  Iris-setosa          5.4          1.7
2  Iris-setosa          5.7          1.7
3  Iris-setosa          5.0          1.6
4  Iris-setosa          5.1          1.9
5  Iris-setosa          4.8          1.6
6  Iris-setosa          4.7          1.6
7  Iris-setosa          5.1          1.6
8  Iris-setosa          5.1          1.7
9  Iris-setosa          4.8          1.6

Documentation

General product information, including installation instructions, is available in the Teradata Documentation website

License

Use of the Teradata Python Package is governed by the License Agreement for the Teradata Python Package for Advanced Analytics. After installation, the LICENSE and LICENSE-3RD-PARTY files are located in the teradataml directory of the Python installation directory.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

teradataml-17.10.0.0-py3-none-any.whl (3.6 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page