REFRACT-IO: To read and write dataframe from different connectors.
Project description
Installation:
Without any dependencies:
pip install refractio
With all dependencies:
pip install refractio[all]
With snowflake:
pip install refractio[snowflake]
With s3:
pip install refractio[s3]
With azureblob:
pip install refractio[azureblob]
With local:
pip install refractio[local]
With sftp:
pip install refractio[sftp]
With mysql:
pip install refractio[mysql]
With hive:
pip install refractio[hive]
With sqlserver:
pip install refractio[sqlserver]
With postgres:
pip install refractio[postgres]
Source code is available at: https://git.lti-aiq.in/refract-sdk/refract-sdk.git
Usage:
To read dataframe with dataset name only -
from refractio import get_dataframe
get_dataframe("dataset_name")
# To read top 3 records from dataframe with filter condition of col1=value1 and col2>value2 and sort by col1.
get_dataframe("dataset_name", row_count=3, filter_condition="where col1='value1' and col2>value2 order by col1")
# For reading data from any other connections not listed here, please pip install mosaic-connector-python package.
To read dataframe with filename from local storage -
from refracio import get_local_dataframe
get_local_dataframe("local_file_name_with_absolute_path", row_count=3)
To use snowflake related operations -
from refractio import snowflake
# To get snowflake connection object with a default snowflake connection created by the user, if available.
snowflake.get_connection()
# To get snowflake connection object with a specific connection name
snowflake.get_connection(connection_name="snowflake_con_name")
# To read a specific dataset published from a snowflake connection
snowflake.get_dataframe("dataset_name")
# To read a specific dataset published from a snowflake connection with only top few records.
snowflake.get_dataframe("dataset_name", row_count=3)
# To read a specific dataset published from a snowflake connection with only top few records and filter conditions.
snowflake.get_dataframe("dataset_name", row_count=3, filter_condition="where col1='value1' and col2>value2 order by col1")
# To execute a user specific query in snowflake, with the specified connection name.
snowflake.execute_query(query="user_query", database="db_name", schema="schema", connection_name="connection_name")
# To execute a user specific query in snowflake, with the current connection object or with the default connection for the user.
snowflake.execute_query(query="user_query", database="db_name", schema="schema")
# To close snowflake connection, please do close the connection after use!
snowflake.close_connection()
To use mysql related operations -
from refractio import mysql
# To get mysql connection object with a default mysql connection created by the user, if available.
mysql.get_connection()
# To get mysql connection object with a specific connection name
mysql.get_connection(connection_name="mysql_con_name")
# To read a specific dataset published from a mysql connection
mysql.get_dataframe("dataset_name")
# To read a specific dataset published from a mysql connection with only top few records.
mysql.get_dataframe("dataset_name", row_count=3)
# To read a specific dataset published from a mysql connection with only top few records and filter conditions.
mysql.get_dataframe("dataset_name", row_count=3, filter_condition="where col1='value1' and col2>value2 order by col1")
# To execute a user specific query in mysql, with the specified connection name.
mysql.execute_query(query="user_query", connection_name="connection_name")
# To execute a user specific query in mysql, with the current connection object or with the default connection for the user.
mysql.execute_query(query="user_query")
# To close mysql connection, please do close the connection after use!
mysql.close_connection()
To use sqlserver related operations -
Requires sqlserver driver library
# Create a custom template with the following commands added in "Pre Init Script" section,
# sudo curl -o /etc/yum.repos.d/mssql-release.repo https://packages.microsoft.com/config/rhel/9.0/prod.repo
# sudo ACCEPT_EULA=Y yum install -y msodbcsql18
from refractio import sqlserver
# To get sqlserver connection object with a default sqlserver connection created by the user, if available.
sqlserver.get_connection()
# To get sqlserver connection object with a specific connection name
sqlserver.get_connection(connection_name="sqlserver_con_name")
# To read a specific dataset published from a sqlserver connection
sqlserver.get_dataframe("dataset_name")
# To read a specific dataset published from a sqlserver connection with only top few records.
sqlserver.get_dataframe("dataset_name", row_count=3)
# To read a specific dataset published from a sqlserver connection with only top few records and filter conditions.
sqlserver.get_dataframe("dataset_name", row_count=3, filter_condition="where col1='value1' and col2>value2 order by col1")
# To execute a user specific query in sqlserver, with the specified connection name.
sqlserver.execute_query(query="user_query", database="db_name", connection_name="connection_name")
# To execute a user specific query in sqlserver, with the current connection object or with the default connection for the user.
sqlserver.execute_query(query="user_query", database="db_name")
# To close sqlserver connection, please do close the connection after use!
sqlserver.close_connection()
To use hive related operations -
from refractio import hive
# To get hive connection object with a default hive connection created by the user, if available. User id is required (1001 is default user_id used).
hive.get_connection(user_id=1001)
# To get hive connection object with a specific connection name, User id is required (1001 is default user_id used).
hive.get_connection(connection_name="hive_con_name", user_id=1001)
# To read a specific dataset published from a hive connection. User id is required (1001 is default user_id used).
hive.get_dataframe("dataset_name", user_id="1001")
# To read a specific dataset published from a hive connection with only top few records. User id is required (1001 is default user_id used)
hive.get_dataframe("dataset_name", user_id="1001", row_count=3)
# To read a specific dataset published from a hive connection with only top few records and filter conditions. User id is required (1001 is default user_id used)
hive.get_dataframe("dataset_name", user_id="1001", row_count=3, filter_condition="where col1='value1' and col2>value2 order by col1")
# To execute a user specific query in hive, with the specified connection name. User id is required (1001 is default user_id used).
hive.execute_query(query="user_query", connection_name="connection_name", user_id="1001")
# To execute a user specific query in hive, with the current connection object or with the default connection for the user. User id is required (1001 is default user_id used).
hive.execute_query(query="user_query", user_id="1001")
# To close hive connection, please do close the connection after use!
hive.close_connection()
To use postgres related operations -
from refractio import postgres
# To get postgres connection object with a default postgres connection created by the user, if available.
postgres.get_connection()
# To get postgres connection object with a specific connection name
postgres.get_connection(connection_name="mysql_con_name")
# To read a specific dataset published from a postgres connection
postgres.get_dataframe("dataset_name")
# To read a specific dataset published from a postgres connection with only top few records.
postgres.get_dataframe("dataset_name", row_count=3)
# To read a specific dataset published from a postgres connection with only top few records and filter conditions.
postgres.get_dataframe("dataset_name", row_count=3, filter_condition="where col1='value1' and col2>value2 order by col1")
# To execute a user specific query in postgres, with the specified connection name.
postgres.execute_query(query="user_query", connection_name="connection_name")
# To execute a user specific query in postgres, with the current connection object or with the default connection for the user.
postgres.execute_query(query="user_query")
# To close postgres connection, please do close the connection after use!
postgres.close_connection()
To use sftp related operations -
from refractio import sftp
# To get sftp connection object with a default sftp connection created by the user, if available.
sftp.get_connection()
# To get sftp connection object with a specific connection name
sftp.get_connection(connection_name="sftp_con_name")
# To read a specific dataset published from a sftp connection
sftp.get_dataframe("dataset_name")
# To read a specific dataset published from a sftp connection with only top few records.
sftp.get_dataframe("dataset_name", row_count=3)
# Use sftp connection object c to do any operation related to sftp like (get, put, listdir etc)
c = sftp.get_connection()
# To close sftp connection, please do close the connection after use!
sftp.close_connection()
To use amazon S3 related operations -
from refractio import s3
# To get s3 connection object with a default s3 connection created by the user, if available.
s3.get_connection()
# To get s3 connection object with a specific connection name
s3.get_connection(connection_name="s3_con_name")
# To read a specific dataset published from a s3 connection
s3.get_dataframe("dataset_name")
# To read a specific dataset published from a s3 connection with only top few records.
s3.get_dataframe("dataset_name", row_count=3)
# Use s3 connection object c to do any operation related to s3.
c = s3.get_connection()
To use azure blob related operations -
from refractio import azure
# To get azure blob connection object with a default azure connection created by the user, if available.
azure.get_connection()
# To get azure blob connection object with a specific connection name
azure.get_connection(connection_name="azureblob_con_name")
# To read a specific dataset published from a azureblob connection
azure.get_dataframe("dataset_name")
# To read a specific dataset published from a azure connection with only top few records.
azure.get_dataframe("dataset_name", row_count=3)
# Use azure connection object c to do any operation related to azure.
c = azure.get_connection()
Note: Currently supported native connectors - snowflake, mysql, hive, sqlserver, postgres, sftp, s3, azureblob, local(NAS)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
refractio-2.1.3.tar.gz
(14.0 kB
view details)
Built Distribution
refractio-2.1.3-py3-none-any.whl
(20.6 kB
view details)
File details
Details for the file refractio-2.1.3.tar.gz
.
File metadata
- Download URL: refractio-2.1.3.tar.gz
- Upload date:
- Size: 14.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e9d48e4d87dcd8c042c4be691e3bc056ec4bee3a3c79c8993bddf18b7b56d4c5 |
|
MD5 | e825984bccaa36468b1d99cc8f3b9fa5 |
|
BLAKE2b-256 | 3029c8607659ec75340e6dcca0b2ed46326965a766078cd2e679f273cb40b10c |
Provenance
File details
Details for the file refractio-2.1.3-py3-none-any.whl
.
File metadata
- Download URL: refractio-2.1.3-py3-none-any.whl
- Upload date:
- Size: 20.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a53d0c4ead0b6b580c3d995580fb868bb7c431aafb1171f18c8093b6f74317a8 |
|
MD5 | c86e3eb8278025a8f7ae1c2c1ca1b4ef |
|
BLAKE2b-256 | 6da5dd82d4af65e2b205b4108db42d2e27309f33ec512cf83d1a08cfd9e922cf |