Skip to main content

Python client library for Extrica

Project description

PyExtrica Library for Extrica Product

PyExtrica allows you to query and transform data in Extrica (Data To AI) platform directly without having to download the data locally. This library provides seamless access to the Extrica platform, allowing users to execute SQL queries and retrieve metadata such as catalog names, schema names, table names, and column information.

Getting Started

Installation

You can install PyExtrica via pip:

pip install pyextrica

Usage

To start using PyExtrica, you first need to import the PyExtricaFunctions module:

from pyextrica import PyExtricaFunctions

Connecting to Extrica

To connect to the Extrica platform, use the extrica_engine method:

from pyextrica import PyExtricaFunctions

# Define connection parameters
user_email = "your_email@example.com"
password = "your_password"
host = "host"
port = 1234 
catalog = "your_catalog"
platform = "data_sources"  # Platform should be either "data_products" or "data_sources"

# Establish connection to Extrica
engine = PyExtricaFunctions.extrica_engine(f"pyextrica://{user_email}:{password}@{host}:{port}/{catalog}?platform={platform}")

Using SQL

You can execute SQL queries using the execute_sql_query method:

sql_query = """
    SELECT column1, column2 FROM table_name LIMIT 10
"""

result = PyExtricaFunctions.execute_sql_query(engine, sql_query)

Replace sql_query with your desired SQL query string.

Querying Data

import pandas as pd

# Execute query and store result in DataFrame
df = pd.DataFrame(result, columns=['column1', 'column2'])
print(df.head())

# Perform DataFrame operations
# Example: Filter DataFrame
filtered_df = df[df['column1'] > 100]
print(filtered_df.head())

DataFrame Aggregation

# Example: Aggregating DataFrame
aggregated_df = df.groupby('column1').agg({'column2': 'sum'}).reset_index()
print(aggregated_df.head())

DataFrame Join

sql_query2 = """
    SELECT column3, column4 FROM second_table LIMIT 10
"""
result2 = PyExtricaFunctions.execute_sql_query(engine, sql_query2)
df2 = pd.DataFrame(result2, columns=['column3', 'column4'])

# Join DataFrames
joined_df = df.merge(df2, on='common_column')
print(joined_df.head())

Available Methods

PyExtrica provides the following methods for interacting with the Extrica platform:

  • extrica_engine: Connects to the Extrica platform.
  • execute_sql_query: Executes SQL queries.
  • get_catalog_names: Retrieves catalog names. (Requires platform=data_products for data products or platform=data_sources for data sources). For data products, catalog name represents the domain name, and schema name represents the subdomain name. For data sources, it is similar to Trino catalog and schema.
  • get_schema_names: Retrieves schema names. (Requires platform=data_products for data products or platform=data_sources for data sources)
  • get_table_names: Retrieves table names. (Requires platform=data_products for data products or platform=data_sources for data sources)
  • get_table_columns: Retrieves column information for a specified table. (Requires platform=data_products for data products or platform=data_sources for data sources)
# Retrieve catalog names
catalogs = PyExtricaFunctions.get_catalog_names(engine)
print("Catalogs:", catalogs)

# Retrieve schema names
schemas = PyExtricaFunctions.get_schema_names(engine)
print("Schemas:", schemas)

# Retrieve table names
tables = PyExtricaFunctions.get_table_names(engine, schema='schema_name')
print("Tables:", tables)

# Retrieve columns information for a table
columns_info = PyExtricaFunctions.get_table_columns(engine, schema='schema_name', table_name='table_name')
print("Columns Information:", columns_info)

Supported Operations

DML operations are only supported for Data Sources and not for Data Products, while DDL operations are supported by both Data Sources and Data Products in PyExtrica.

Example of DML Query

# Example of executing a DML query
dml_query = """
    INSERT INTO table_name (column1, column2) VALUES (value1, value2)
"""

result = PyExtricaFunctions.execute_sql_query(engine, dml_query)
print(result)  

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyextrica-17.2.1.tar.gz (39.8 kB view details)

Uploaded Source

Built Distribution

pyextrica-17.2.1-py3-none-any.whl (42.2 kB view details)

Uploaded Python 3

File details

Details for the file pyextrica-17.2.1.tar.gz.

File metadata

  • Download URL: pyextrica-17.2.1.tar.gz
  • Upload date:
  • Size: 39.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.7

File hashes

Hashes for pyextrica-17.2.1.tar.gz
Algorithm Hash digest
SHA256 202e58cb455aa2c58a5d24fa426ee8eefb9bb5ad8f7aa6885d93e5028ce4317e
MD5 a611e7c14d7910ef4b6bd2b11af29ca2
BLAKE2b-256 5f5a1d5bab3ff027d4474c9cd29a446f27c66e81429178e4a302443671cf99e6

See more details on using hashes here.

File details

Details for the file pyextrica-17.2.1-py3-none-any.whl.

File metadata

  • Download URL: pyextrica-17.2.1-py3-none-any.whl
  • Upload date:
  • Size: 42.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.7

File hashes

Hashes for pyextrica-17.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9befb2908b8639ecc64ad213f4f4070787ae31fa491f69ebd5176b8638752d17
MD5 4fb48ebc5de44bcf8e9544258461ab04
BLAKE2b-256 6a5e62398459225c525ea116cbc8d236fa6ef72eaa27af7f4812431db9d6c79c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page