REFRACT-IO: To read and write dataframe from different connectors.
Project description
Installation:
Without any dependencies:
pip install refractio
With all dependencies:
pip install refractio[all]
With snowflake:
pip install refractio[snowflake]
With s3:
pip install refractio[s3]
With azureblob:
pip install refractio[azureblob]
With local:
pip install refractio[local]
With sftp:
pip install refractio[sftp]
With mysql:
pip install refractio[mysql]
With hive:
pip install refractio[hive]
With sqlserver:
pip install refractio[sqlserver]
With postgres:
pip install refractio[postgres]
Source code is available at: https://git.lti-aiq.in/refract-sdk/refract-sdk.git
Usage:
To read dataframe with dataset name only -
from refractio import get_dataframe
get_dataframe("dataset_name")
# To read top 3 records from dataframe with filter condition of col1=value1 and col2>value2 and sort by col1.
get_dataframe("dataset_name", row_count=3, filter_condition="where col1='value1' and col2>value2 order by col1")
# For reading data from any other connections not listed here, please pip install mosaic-connector-python package.
To read dataframe with filename from local storage -
from refracio import get_local_dataframe
get_local_dataframe("local_file_name_with_absolute_path", row_count=3)
To use snowflake related operations -
from refractio import snowflake
# To get snowflake connection object with a default snowflake connection created by the user, if available.
snowflake.get_connection()
# To get snowflake connection object with a specific connection name
snowflake.get_connection(connection_name="snowflake_con_name")
# To read a specific dataset published from a snowflake connection
snowflake.get_dataframe("dataset_name")
# To read a specific dataset published from a snowflake connection with only top few records.
snowflake.get_dataframe("dataset_name", row_count=3)
# To read a specific dataset published from a snowflake connection with only top few records and filter conditions.
snowflake.get_dataframe("dataset_name", row_count=3, filter_condition="where col1='value1' and col2>value2 order by col1")
# To execute a user specific query in snowflake, with the specified connection name.
snowflake.execute_query(query="user_query", database="db_name", schema="schema", connection_name="connection_name")
# To execute a user specific query in snowflake, with the current connection object or with the default connection for the user.
snowflake.execute_query(query="user_query", database="db_name", schema="schema")
# To close snowflake connection, please do close the connection after use!
snowflake.close_connection()
To use mysql related operations -
from refractio import mysql
# To get mysql connection object with a default mysql connection created by the user, if available.
mysql.get_connection()
# To get mysql connection object with a specific connection name
mysql.get_connection(connection_name="mysql_con_name")
# To read a specific dataset published from a mysql connection
mysql.get_dataframe("dataset_name")
# To read a specific dataset published from a mysql connection with only top few records.
mysql.get_dataframe("dataset_name", row_count=3)
# To read a specific dataset published from a mysql connection with only top few records and filter conditions.
mysql.get_dataframe("dataset_name", row_count=3, filter_condition="where col1='value1' and col2>value2 order by col1")
# To execute a user specific query in mysql, with the specified connection name.
mysql.execute_query(query="user_query", connection_name="connection_name")
# To execute a user specific query in mysql, with the current connection object or with the default connection for the user.
mysql.execute_query(query="user_query")
# To close mysql connection, please do close the connection after use!
mysql.close_connection()
To use sqlserver related operations -
Requires sqlserver driver library
# Create a custom template with the following commands added in "Pre Init Script" section,
# sudo curl -o /etc/yum.repos.d/mssql-release.repo https://packages.microsoft.com/config/rhel/9.0/prod.repo
# sudo ACCEPT_EULA=Y yum install -y msodbcsql18
from refractio import sqlserver
# To get sqlserver connection object with a default sqlserver connection created by the user, if available.
sqlserver.get_connection()
# To get sqlserver connection object with a specific connection name
sqlserver.get_connection(connection_name="sqlserver_con_name")
# To read a specific dataset published from a sqlserver connection
sqlserver.get_dataframe("dataset_name")
# To read a specific dataset published from a sqlserver connection with only top few records.
sqlserver.get_dataframe("dataset_name", row_count=3)
# To read a specific dataset published from a sqlserver connection with only top few records and filter conditions.
sqlserver.get_dataframe("dataset_name", row_count=3, filter_condition="where col1='value1' and col2>value2 order by col1")
# To execute a user specific query in sqlserver, with the specified connection name.
sqlserver.execute_query(query="user_query", database="db_name", connection_name="connection_name")
# To execute a user specific query in sqlserver, with the current connection object or with the default connection for the user.
sqlserver.execute_query(query="user_query", database="db_name")
# To close sqlserver connection, please do close the connection after use!
sqlserver.close_connection()
To use hive related operations -
from refractio import hive
# To get hive connection object with a default hive connection created by the user, if available. User id is required (1001 is default user_id used).
hive.get_connection(user_id=1001)
# To get hive connection object with a specific connection name, User id is required (1001 is default user_id used).
hive.get_connection(connection_name="hive_con_name", user_id=1001)
# To read a specific dataset published from a hive connection. User id is required (1001 is default user_id used).
hive.get_dataframe("dataset_name", user_id="1001")
# To read a specific dataset published from a hive connection with only top few records. User id is required (1001 is default user_id used)
hive.get_dataframe("dataset_name", user_id="1001", row_count=3)
# To read a specific dataset published from a hive connection with only top few records and filter conditions. User id is required (1001 is default user_id used)
hive.get_dataframe("dataset_name", user_id="1001", row_count=3, filter_condition="where col1='value1' and col2>value2 order by col1")
# To execute a user specific query in hive, with the specified connection name. User id is required (1001 is default user_id used).
hive.execute_query(query="user_query", connection_name="connection_name", user_id="1001")
# To execute a user specific query in hive, with the current connection object or with the default connection for the user. User id is required (1001 is default user_id used).
hive.execute_query(query="user_query", user_id="1001")
# To close hive connection, please do close the connection after use!
hive.close_connection()
To use postgres related operations -
from refractio import postgres
# To get postgres connection object with a default postgres connection created by the user, if available.
postgres.get_connection()
# To get postgres connection object with a specific connection name
postgres.get_connection(connection_name="mysql_con_name")
# To read a specific dataset published from a postgres connection
postgres.get_dataframe("dataset_name")
# To read a specific dataset published from a postgres connection with only top few records.
postgres.get_dataframe("dataset_name", row_count=3)
# To read a specific dataset published from a postgres connection with only top few records and filter conditions.
postgres.get_dataframe("dataset_name", row_count=3, filter_condition="where col1='value1' and col2>value2 order by col1")
# To execute a user specific query in postgres, with the specified connection name.
postgres.execute_query(query="user_query", connection_name="connection_name")
# To execute a user specific query in postgres, with the current connection object or with the default connection for the user.
postgres.execute_query(query="user_query")
# To close postgres connection, please do close the connection after use!
postgres.close_connection()
To use sftp related operations -
from refractio import sftp
# To get sftp connection object with a default sftp connection created by the user, if available.
sftp.get_connection()
# To get sftp connection object with a specific connection name
sftp.get_connection(connection_name="sftp_con_name")
# To read a specific dataset published from a sftp connection
sftp.get_dataframe("dataset_name")
# To read a specific dataset published from a sftp connection with only top few records.
sftp.get_dataframe("dataset_name", row_count=3)
# Use sftp connection object c to do any operation related to sftp like (get, put, listdir etc)
c = sftp.get_connection()
# To close sftp connection, please do close the connection after use!
sftp.close_connection()
To use amazon S3 related operations -
from refractio import s3
# To get s3 connection object with a default s3 connection created by the user, if available.
s3.get_connection()
# To get s3 connection object with a specific connection name
s3.get_connection(connection_name="s3_con_name")
# To read a specific dataset published from a s3 connection
s3.get_dataframe("dataset_name")
# To read a specific dataset published from a s3 connection with only top few records.
s3.get_dataframe("dataset_name", row_count=3)
# Use s3 connection object c to do any operation related to s3.
c = s3.get_connection()
To use azure blob related operations -
from refractio import azure
# To get azure blob connection object with a default azure connection created by the user, if available.
azure.get_connection()
# To get azure blob connection object with a specific connection name
azure.get_connection(connection_name="azureblob_con_name")
# To read a specific dataset published from a azureblob connection
azure.get_dataframe("dataset_name")
# To read a specific dataset published from a azure connection with only top few records.
azure.get_dataframe("dataset_name", row_count=3)
# Use azure connection object c to do any operation related to azure.
c = azure.get_connection()
Note: Currently supported native connectors - snowflake, mysql, hive, sqlserver, postgres, sftp, s3, azureblob, local(NAS)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
refractio-2.1.5.tar.gz
(13.9 kB
view details)
Built Distribution
refractio-2.1.5-py3-none-any.whl
(20.5 kB
view details)
File details
Details for the file refractio-2.1.5.tar.gz
.
File metadata
- Download URL: refractio-2.1.5.tar.gz
- Upload date:
- Size: 13.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 23b9a2316d8305884c2fd4606a0c35e8b1e130a5d9849f110ea126876708f476 |
|
MD5 | 6cfb81b4de7668013d7116f8e07dfc5d |
|
BLAKE2b-256 | 6fa02da4258fa534660d7df7f25e365443934a082091af01f2e8585a2f036d39 |
Provenance
File details
Details for the file refractio-2.1.5-py3-none-any.whl
.
File metadata
- Download URL: refractio-2.1.5-py3-none-any.whl
- Upload date:
- Size: 20.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 469e4f4520add85e5342e409cedbd35407a78db8a0e0880ae58fd39251cb40e2 |
|
MD5 | b605d8f6d477e7897861ca288b3d2031 |
|
BLAKE2b-256 | c5193fb3c1619c4f6cb491f47d89190f98d9eb7f90bac36565eeb415506e5bcb |