Skip to main content

Client for the e6data distributed SQL Engine.

Project description

e6data Python Connector

version

Introduction

The e6data Connector for Python provides an interface for writing Python applications that can connect to e6data and perform operations.

Dependencies

Make sure to install below dependencies and wheel before install e6data-python-connector.

# Amazon Linux / CentOS dependencies
yum install python3-devel gcc-c++ -y

# Ubuntu/Debian dependencies
apt install python3-dev g++ -y


# Pip dependencies
pip install wheel

To install the Python package, use the command below:

pip install --no-cache-dir e6data-python-connector

Prerequisites

  • Open Inbound Port 80 in the Engine Cluster.
  • Limit access to Port 80 according to your organizational security policy. Public access is not encouraged.
  • Access Token generated in the e6data console.

Create a Connection

Use your e6data Email ID as the username and your access token as the password.

from e6data_python_connector import Connection

username = '<username>'  # Your e6data Email ID.
password = '<password>'  # Access Token generated in the e6data console.

host = '<host>'  # IP address or hostname of the cluster to be used.
database = '<database>'  # Database to perform the query on.
port = 80  # Port of the e6data engine.
catalog_name = '<catalog_name>'

conn = Connection(
    host=host,
    port=port,
    username=username,
    database=database,
    password=password
)

Perform a Queries & Get Results

query = 'SELECT * FROM <TABLE_NAME>'  # Replace with the query.

cursor = conn.cursor(catalog_name=catalog_name)
query_id = cursor.execute(query)  # The execute function returns a unique query ID, which can be use to abort the query.
all_records = cursor.fetchall()
for row in all_records:
   print(row)

To fetch all the records:

records = cursor.fetchall()

To fetch one record:

record = cursor.fetchone()

To fetch limited records:

limit = 500
records = cursor.fetchmany(limit)

To fetch all the records in buffer to reduce memory consumption:

records_iterator = cursor.fetchall_buffer()  # Returns generator
for item in records_iterator:
    print(item)

To get the execution plan after query execution:

import json
explain_response = cursor.explain_analyse()
query_planner = json.loads(explain_response.get('planner'))

To abort a running query:

query_id = '<query_id>'  # query id from execute function response.
cursor.cancel(query_id)

Switch database in an existing connection:

database = '<new_database_name>'  # Replace with the new database.
cursor = conn.cursor(database, catalog_name)

Get Query Time Metrics

import json
query = 'SELECT * FROM <TABLE_NAME>'

cursor = conn.cursor(catalog_name)
query_id = cursor.execute(query)  # execute function returns query id, can be use for aborting the query.
all_records = cursor.fetchall()
explain_response = cursor.explain_analyse()
query_planner = json.loads(explain_response.get('planner'))

execution_time = query_planner.get("total_query_time")  # In milliseconds
queue_time = query_planner.get("executionQueueingTime")  # In milliseconds
parsing_time = query_planner.get("parsingTime")  # In milliseconds
row_count = query_planner.rowcount

Get Schema - a list of Databases, Tables or Columns

The following code returns a dictionary of all databases, all tables and all columns connected to the cluster currently in use. This function can be used without passing database name to get list of all databases.

databases = conn.get_schema_names()  # To get list of databases.
print(databases)

database = '<database_name>'  # Replace with actual database name.
tables = conn.get_tables(database=database)  # To get list of tables from a database.
print(tables)

table_name = '<table_name>'  # Replace with actual table name.
columns = conn.get_tables(database=database, table=table_name)  # To get the list of columns from a table.
columns_with_type = list()
"""
Getting the column name and type.
"""
for column in columns:
   columns_with_type.append(dict(column_name=column.fieldName, column_type=column.fieldType))
print(columns_with_type)

Code Hygiene

It is recommended to clear the cursor, close the cursor and close the connection after running a function as a best practice. This enhances performance by clearing old data from memory.

cursor.clear() # Not needed when aborting a query
cursor.close()
conn.close()

Code Example

The following code is an example which combines a few functions described above.

from e6data_python_connector import Connection
import json

username = '<username>'  # Your e6data Email ID.
password = '<password>'  # Access Token generated in the e6data console.

host = '<host>'  # IP address or hostname of the cluster to be used.
database = '<database>'  # # Database to perform the query on.
port = 80  # Port of the e6data engine.

sql_query = 'SELECT * FROM <TABLE_NAME>'  # Replace with the actual query.

catalog_name = '<catalog_name>'  # Replace with the actual catalog name.

conn = Connection(
    host=host,
    port=port,
    username=username,
    database=database,
    password=password
)

cursor = conn.cursor(db_name=database, catalog_name=catalog_name)
query_id = cursor.execute(sql_query)
all_records = cursor.fetchall()
explain_response = cursor.explain_analyse()
planner_result = json.loads(explain_response.get('planner'))
execution_time = planner_result.get("total_query_time") / 1000  # Converting into seconds.
row_count = cursor.rowcount
columns = [col[0] for col in cursor.description]  # Get the column names and merge them with the results.
results = []
for row in all_records:
   row = dict(zip(columns, row))
   results.append(row)
   print(row)
print('Total row count {}, Execution Time (seconds): {}'.format(row_count, execution_time))
cursor.clear()
cursor.close()
conn.close()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

e6data_python_connector-2.2.1rc4.tar.gz (55.2 kB view details)

Uploaded Source

Built Distribution

e6data_python_connector-2.2.1rc4-py3-none-any.whl (58.6 kB view details)

Uploaded Python 3

File details

Details for the file e6data_python_connector-2.2.1rc4.tar.gz.

File metadata

File hashes

Hashes for e6data_python_connector-2.2.1rc4.tar.gz
Algorithm Hash digest
SHA256 8c45f35ccdfb46037b5d1e2d6620fdf0887a67d2b778b447773e63890287e3a0
MD5 9cfba8248a30d999a84a0ab0ba3b881b
BLAKE2b-256 4c54444495003eb45a2509a6b9a96db527ebef5b638a2da8b2187babc71f887f

See more details on using hashes here.

File details

Details for the file e6data_python_connector-2.2.1rc4-py3-none-any.whl.

File metadata

File hashes

Hashes for e6data_python_connector-2.2.1rc4-py3-none-any.whl
Algorithm Hash digest
SHA256 d0df735166971810ac422600c73007f2fba64d5dbda7bf4f105e719ae7152659
MD5 db73e8c31cae6928a5b9d03c4b243899
BLAKE2b-256 54e451aa577e0bfbcc21452d474f1ce3b41a1f4f8a0f41f202e6ace0252223bb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page