REFRACT-IO: To read and write dataframe from different connectors.

Project description

Installation:

Without any dependencies:

pip install refractio

With all dependencies:

pip install refractio[all]

With snowflake:

pip install refractio[snowflake]

With s3:

pip install refractio[s3]

With azureblob:

pip install refractio[azureblob]

With local:

pip install refractio[local]

With sftp:

pip install refractio[sftp]

With mysql:

pip install refractio[mysql]

With hive:

pip install refractio[hive]

With sqlserver:

pip install refractio[sqlserver]

With postgres:

pip install refractio[postgres]

Source code is available at: https://git.lti-aiq.in/refract-sdk/refract-sdk.git

Usage:

To read dataframe with dataset name only -

from refractio import get_dataframe
get_dataframe("dataset_name")

# To read top 3 records from dataframe with filter condition of col1=value1 and col2>value2 and sort by col1.
get_dataframe("dataset_name", row_count=3, filter_condition="where col1='value1' and col2>value2 order by col1")

# For reading data from any other connections not listed here, please pip install mosaic-connector-python package.

To read dataframe with filename from local storage -

from refracio import get_local_dataframe
get_local_dataframe("local_file_name_with_absolute_path", row_count=3)

To use snowflake related operations -

from refractio import snowflake

# To get snowflake connection object with a default snowflake connection created by the user, if available.
snowflake.get_connection()

# To get snowflake connection object with a specific connection name
snowflake.get_connection(connection_name="snowflake_con_name")

# To read a specific dataset published from a snowflake connection
snowflake.get_dataframe("dataset_name")

# To read a specific dataset published from a snowflake connection with only top few records.
snowflake.get_dataframe("dataset_name", row_count=3)

# To read a specific dataset published from a snowflake connection with only top few records and filter conditions.
snowflake.get_dataframe("dataset_name", row_count=3, filter_condition="where col1='value1' and col2>value2 order by col1")

# To execute a user specific query in snowflake, with the specified connection name.
snowflake.execute_query(query="user_query", database="db_name", schema="schema", connection_name="connection_name")

# To execute a user specific query in snowflake, with the current connection object or with the default connection for the user.
snowflake.execute_query(query="user_query", database="db_name", schema="schema")

# To close snowflake connection, please do close the connection after use!
snowflake.close_connection()

To use mysql related operations -

from refractio import mysql

# To get mysql connection object with a default mysql connection created by the user, if available.
mysql.get_connection()

# To get mysql connection object with a specific connection name
mysql.get_connection(connection_name="mysql_con_name")

# To read a specific dataset published from a mysql connection
mysql.get_dataframe("dataset_name")

# To read a specific dataset published from a mysql connection with only top few records.
mysql.get_dataframe("dataset_name", row_count=3)

# To read a specific dataset published from a mysql connection with only top few records and filter conditions.
mysql.get_dataframe("dataset_name", row_count=3, filter_condition="where col1='value1' and col2>value2 order by col1")

# To execute a user specific query in mysql, with the specified connection name.
mysql.execute_query(query="user_query", connection_name="connection_name")

# To execute a user specific query in mysql, with the current connection object or with the default connection for the user.
mysql.execute_query(query="user_query")

# To close mysql connection, please do close the connection after use!
mysql.close_connection()

To use sqlserver related operations -

Requires sqlserver driver library

# Create a custom template with the following commands added in "Pre Init Script" section,
# sudo curl -o /etc/yum.repos.d/mssql-release.repo https://packages.microsoft.com/config/rhel/9.0/prod.repo
# sudo ACCEPT_EULA=Y yum install -y msodbcsql18
from refractio import sqlserver

# To get sqlserver connection object with a default sqlserver connection created by the user, if available.
sqlserver.get_connection()

# To get sqlserver connection object with a specific connection name
sqlserver.get_connection(connection_name="sqlserver_con_name")

# To read a specific dataset published from a sqlserver connection
sqlserver.get_dataframe("dataset_name")

# To read a specific dataset published from a sqlserver connection with only top few records.
sqlserver.get_dataframe("dataset_name", row_count=3)

# To read a specific dataset published from a sqlserver connection with only top few records and filter conditions.
sqlserver.get_dataframe("dataset_name", row_count=3, filter_condition="where col1='value1' and col2>value2 order by col1")

# To execute a user specific query in sqlserver, with the specified connection name.
sqlserver.execute_query(query="user_query", database="db_name", connection_name="connection_name")

# To execute a user specific query in sqlserver, with the current connection object or with the default connection for the user.
sqlserver.execute_query(query="user_query", database="db_name")

# To close sqlserver connection, please do close the connection after use!
sqlserver.close_connection()

To use hive related operations -

from refractio import hive

# To get hive connection object with a default hive connection created by the user, if available. User id is required (1001 is default user_id used).
hive.get_connection(user_id=1001)

# To get hive connection object with a specific connection name, User id is required (1001 is default user_id used).
hive.get_connection(connection_name="hive_con_name", user_id=1001)

# To read a specific dataset published from a hive connection. User id is required (1001 is default user_id used).
hive.get_dataframe("dataset_name", user_id="1001")

# To read a specific dataset published from a hive connection with only top few records. User id is required (1001 is default user_id used)
hive.get_dataframe("dataset_name", user_id="1001", row_count=3)

# To read a specific dataset published from a hive connection with only top few records and filter conditions. User id is required (1001 is default user_id used)
hive.get_dataframe("dataset_name", user_id="1001", row_count=3, filter_condition="where col1='value1' and col2>value2 order by col1")

# To execute a user specific query in hive, with the specified connection name. User id is required (1001 is default user_id used).
hive.execute_query(query="user_query", connection_name="connection_name", user_id="1001")

# To execute a user specific query in hive, with the current connection object or with the default connection for the user. User id is required (1001 is default user_id used).
hive.execute_query(query="user_query", user_id="1001")

# To close hive connection, please do close the connection after use!
hive.close_connection()

To use postgres related operations -

from refractio import postgres

# To get postgres connection object with a default postgres connection created by the user, if available.
postgres.get_connection()

# To get postgres connection object with a specific connection name
postgres.get_connection(connection_name="mysql_con_name")

# To read a specific dataset published from a postgres connection
postgres.get_dataframe("dataset_name")

# To read a specific dataset published from a postgres connection with only top few records.
postgres.get_dataframe("dataset_name", row_count=3)

# To read a specific dataset published from a postgres connection with only top few records and filter conditions.
postgres.get_dataframe("dataset_name", row_count=3, filter_condition="where col1='value1' and col2>value2 order by col1")

# To execute a user specific query in postgres, with the specified connection name.
postgres.execute_query(query="user_query", connection_name="connection_name")

# To execute a user specific query in postgres, with the current connection object or with the default connection for the user.
postgres.execute_query(query="user_query")

# To close postgres connection, please do close the connection after use!
postgres.close_connection()

To use sftp related operations -

from refractio import sftp

# To get sftp connection object with a default sftp connection created by the user, if available.
sftp.get_connection()

# To get sftp connection object with a specific connection name
sftp.get_connection(connection_name="sftp_con_name")

# To read a specific dataset published from a sftp connection
sftp.get_dataframe("dataset_name")

# To read a specific dataset published from a sftp connection with only top few records.
sftp.get_dataframe("dataset_name", row_count=3)

# Use sftp connection object c to do any operation related to sftp like (get, put, listdir etc)
c = sftp.get_connection()

# To close sftp connection, please do close the connection after use!
sftp.close_connection()

To use amazon S3 related operations -

from refractio import s3

# To get s3 connection object with a default s3 connection created by the user, if available.
s3.get_connection()

# To get s3 connection object with a specific connection name
s3.get_connection(connection_name="s3_con_name")

# To read a specific dataset published from a s3 connection
s3.get_dataframe("dataset_name")

# To read a specific dataset published from a s3 connection with only top few records.
s3.get_dataframe("dataset_name", row_count=3)

# Use s3 connection object c to do any operation related to s3.
c = s3.get_connection()

To use azure blob related operations -

from refractio import azure

# To get azure blob connection object with a default azure connection created by the user, if available.
azure.get_connection()

# To get azure blob connection object with a specific connection name
azure.get_connection(connection_name="azureblob_con_name")

# To read a specific dataset published from a azureblob connection
azure.get_dataframe("dataset_name")

# To read a specific dataset published from a azure connection with only top few records.
azure.get_dataframe("dataset_name", row_count=3)

# Use azure connection object c to do any operation related to azure.
c = azure.get_connection()

Note: Currently supported native connectors - snowflake, mysql, hive, sqlserver, postgres, sftp, s3, azureblob, local(NAS)

Project details

Release history Release notifications | RSS feed

2.1.5.6

Jun 26, 2024

2.1.5.5

Apr 20, 2024

2.1.5.4

Mar 19, 2024

2.1.5.3

Mar 15, 2024

2.1.5.2

Mar 7, 2024

2.1.5.1

Mar 7, 2024

2.1.5

Jan 10, 2024

2.1.4

Jan 9, 2024

This version

2.1.3

Jan 2, 2024

2.1.2

Dec 19, 2023

2.1.0.1

Nov 22, 2023

2.1.0

Nov 10, 2023

2.0.5.10

Jul 7, 2023

2.0.5.9

Jun 29, 2023

2.0.5.8

Jun 29, 2023

2.0.5.7

Jun 27, 2023

2.0.5.6

Jun 21, 2023

2.0.5.5

Jun 15, 2023

2.0.5.4

Jun 14, 2023

2.0.5.3

Jun 14, 2023

2.0.5.2

Jun 14, 2023

2.0.5.1

Jun 13, 2023

2.0.5

May 31, 2023

2.0.4

May 31, 2023

2.0.3

May 30, 2023

2.0.2

May 26, 2023

2.0.1

May 23, 2023

2.0.0

May 19, 2023

1.0.3

May 9, 2023

1.0.2

Apr 28, 2023

1.0.1

Apr 28, 2023

1.0.0

Apr 27, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

refractio-2.1.3.tar.gz (14.0 kB view details)

Uploaded Jan 2, 2024 Source

Built Distribution

refractio-2.1.3-py3-none-any.whl (20.6 kB view details)

Uploaded Jan 2, 2024 Python 3

File details

Details for the file refractio-2.1.3.tar.gz.

File metadata

Download URL: refractio-2.1.3.tar.gz
Upload date: Jan 2, 2024
Size: 14.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for refractio-2.1.3.tar.gz
Algorithm	Hash digest
SHA256	`e9d48e4d87dcd8c042c4be691e3bc056ec4bee3a3c79c8993bddf18b7b56d4c5`
MD5	`e825984bccaa36468b1d99cc8f3b9fa5`
BLAKE2b-256	`3029c8607659ec75340e6dcca0b2ed46326965a766078cd2e679f273cb40b10c`

See more details on using hashes here.

Provenance

File details

Details for the file refractio-2.1.3-py3-none-any.whl.

File metadata

Download URL: refractio-2.1.3-py3-none-any.whl
Upload date: Jan 2, 2024
Size: 20.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for refractio-2.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a53d0c4ead0b6b580c3d995580fb868bb7c431aafb1171f18c8093b6f74317a8`
MD5	`c86e3eb8278025a8f7ae1c2c1ca1b4ef`
BLAKE2b-256	`6da5dd82d4af65e2b205b4108db42d2e27309f33ec512cf83d1a08cfd9e922cf`

See more details on using hashes here.

refractio 2.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation:

Without any dependencies:

With all dependencies:

With snowflake:

With s3:

With azureblob:

With local:

With sftp:

With mysql:

With hive:

With sqlserver:

With postgres:

Source code is available at: https://git.lti-aiq.in/refract-sdk/refract-sdk.git

Usage:

To read dataframe with dataset name only -

To read dataframe with filename from local storage -

To use snowflake related operations -

To use mysql related operations -

To use sqlserver related operations -

Requires sqlserver driver library

To use hive related operations -

To use postgres related operations -

To use sftp related operations -

To use amazon S3 related operations -

To use azure blob related operations -

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance