No project description provided

Project description

Andrew's all-in-one data reader - AZDSDR

Andrew's all-in-one data reader - AZDSDR

This package includes data reader for DS to access data in a easy way.

Covered data platforms:

Kusto
Azure Blob Storage
Dremio
Microsoft Cosmos (Not Azure Cosmos DB, the Microsoft Cosmos using Scope, now AKA Azure Data Lake)

May cover in the future:

Databricks/Spark
Microsoft Synapse
Delta Lake
Postgresql
Microsoft SQL Server
SQLite

Besides, the package also include functions from Pipelines class to help move data around:

Dremio to Kusto
Kusto to CSV file

Installation

The module is test and usable for Python 3.10 and Python 3.9. Other versions(Python 3.6+) should also works.

Use pip to install the package and all of the dependences

pip install -U azdsdr

The -U will help update your old version to the newest

Or, you can clone the repository and copy over the readers.py file to your project folder.

The installation will also install all the dependance packages automatrically.

pandas
pyodbc
azure-cli
azure-kusto-data
azure-kusto-ingest
azure-storage-blob
matplotlib
ipython
ipykernel

If you are working on a new build OS, the all-in-one installation will also save you time from installing individual packages one by one.

Use Kusto Reader

Azure CLI Authentication

Before running the kusto query, please use

az login

To login into Azure using AAD authentication. An authentication refresh token is generated by Azure and stored in your local machine. This token will be revoked after 90 days of inactivity.

For More details, read Sign in with Azure CLI.

After successufuly authenticated with AAD, you should be able to run the following code without any pop up auth request. The Kusto Reader is test in Windows 10, also works in Linux and Mac.

Run any Kusto query

from azdsdr.readers import KustoReader

cluster = "https://help.kusto.windows.net"
db      = "Samples"
kr      = KustoReader(cluster=cluster,db=db)

kql     = "StormEvents | take 10"
r       = kr.run_kql(kql)

The function run_kql will return a Pandas Dataframe object hold by r. The kr object will be reused in the following samples.

Use run_kql_all to output multiple result set.

kql = '''
StormEvents 
| take 10
;
StormEvents 
| summarize count()
'''
rs = kr.run_kql_all(kql=kql)
for r in rs:
    display(r)

Show Kusto tables

List all tables:

kr.list_tables()

List tables with folder keyword:

kr.list_tables(folder_name='Covid19')

Create an empty Kusto table from a CSV file

This function can be used before uploading CSV data to Kusto table. Instead of manually creating a Kusto table from CSV schema, use this function to create a empty Kusto table based on CSV file automatically.

Besides, you can also specify the table's folder name.

kusto_table_name  = 'target_kusto_table'
folder_name       = 'target_kusto_folder'
csv_file_name     = 'local_csv_path'
kr.create_table_from_csv (
    kusto_table_name    = kusto_table_name
    ,csv_file_path      = csv_file_name
    ,kusto_folder       = folder_name
)

Upload data to Kusto

Before uploading your data to Kusto, please make sure you have the right table created to hold the data. Ideally, you can use the above create_table_from_csv to create an empty table for you.

To enable the data ingestion(upload), you should also initialize the KustoReader object with an additional ingest_cluster_str parameter. Here is a sample, you should ask your admin or doc to find out the ingestion cluster url.

cluster         = "https://help.kusto.windows.net"
ingest_cluster  = "https://help-ingest.kusto.windows.net"
db              = "Samples"
kr              = KustoReader(cluster=cluster,db=db,ingest_cluster_str=ingest_cluster)

Upload Pandas Dataframe to Kusto:

target_kusto_table  = 'kusto_table_name'
df_data             = you_df_data
kr.upload_df_to_kusto(
    target_table_name = target_kusto_table
    ,df_data          = df_data
)

Upload CSV file to Kusto:

target_kusto_table  = 'kusto_table_name'
csv_path            = 'csv_file.csv'
kr.upload_csv_to_kusto(
    target_table_name = target_kusto_table
    ,csv_path         = csv_path
)

Upload Azure Blob CSV file to Kusto, this is the best and fast way to upload massive csv data to Kusto table.

target_kusto_table  = 'kusto_table_name'
blob_sas_url = 'the sas url you generate from Azure portal or Azure Storage Explorer, or azdsdr'
kr.upload_csv_from_blob (
    target_table_name   = kusto_table_name
    ,blob_sas_url       = blob_sas_url
)

I will cover how to generate blob_sas_url in the Azure Blob Reader section. [TODO]

Use Dremimo Reader

Step 1. Install Dremio Connector

You will need to install the Dremio ODBC driver first to use DremioReader from this package.

For Windows user

Please download the dremio-connector file from the drivers folder.

Step 2. Generate a Personal Access Token(PAT)

Recommend storing this personal access token in a safe location, such as a user environment variable on your local machine.
Start Menu -> “Edit Environment variables For Your Account”.
Click “New” under environment variables.
Enter a new variable with name “DREMIO_TOKEN” and set the value to the PAT you generated earlier.

Note: you will have to log out your Windows account and log in again to take the new env variable take effort.

Step 3. Configure driver

Go to Start Menu -> “ODBC Data Sources (64-bit)”.
Under User DSN, click “Add”.
Add Dremio Connector.
Configure as follows:
- set Data Source Name as Dremio Connector.
- with your own user@example.com as the username.
- Do remember to replace the dremio host with your own host string.

Click Ok/Save

For Linux and Mac User

You can download the driver from Dremio's ODBC Driver page. It should be working in theory, haven't been test yet.

Dremio Sample Query

from azdsdr.readers import DremioReader
import os

username    = "name@host.com"
#token       = "token string"
token       = os.environ.get("DREMIO_TOKEN") 
dr          = DremioReader(username=username,token=token)

sql = '''
select 
    * 
from 
    [workspace].[folder].[tablename]
limit 10
'''
r = dr.run_sql(sql)

Move data with functions from `Pipelines` class

Export Kusto data to local csv file

[TODO]

When the export data is very large like exceed 1 billion rows, kusto will export data to several csv files. this function will automatically combine the data to one single CSV file in destination folder.

Move Dremio data to Kusto

[TODO]

Data Tools

`display_all` Display all dataframe rows

The IPython's display can display only limited rows of data. This tool can display all or specified rows of data.

from azdsdr.tools import pd_tools
display_all = pd_tools().display_all

#...prepare pd data

# display all 
display_all(pd_data)

# display top 20 rows
display_all(pd_data,top=20)

Thanks

The Dremio ODBC Reader solution is origin from KC Munnings. Glory and credits belong to KC.

Update Logs

Jan 10, 2023

Add guid to the temp cosmos script file and temp middle stream file to avoid temp files collision.

Dec 16, 2022

Add function get_table_schema for KustoReader
Add function get_table_folder for KustoReader

Dec 10, 2022

Update Dremio Reader configuration document and screenshot.

Dec 6, 2022

Add function download_file_list of class AzureBlobReader to download a list of CSV file with the same schema and merge to one target CSV file.
Add function delete_blob_files of class AzureBlobReader to delete a list of blob files.
Add usage sample code.

Project details

Release history Release notifications | RSS feed

1.230612.2

Jun 12, 2023

1.230505.2

May 5, 2023

1.230505.1

May 5, 2023

1.230424.2

Apr 24, 2023

1.230424.1

Apr 24, 2023

1.230304.1

Mar 4, 2023

1.230124.1

Jan 24, 2023

1.230123.1

Jan 23, 2023

1.230118.3

Jan 18, 2023

1.230118.2

Jan 18, 2023

1.230118.1

Jan 18, 2023

1.230117.1

Jan 17, 2023

This version

1.230110.1

Jan 11, 2023

1.221206.2

Dec 6, 2022

1.221206.1

Dec 6, 2022

1.221128.1

Nov 28, 2022

1.221118.1

Nov 19, 2022

1.221111.2

Nov 11, 2022

1.221111.1

Nov 11, 2022

1.221110.1

Nov 10, 2022

1.221109.1

Nov 10, 2022

1.221108.1

Nov 8, 2022

1.221018.3

Oct 18, 2022

1.221018.2

Oct 18, 2022

1.221018.1

Oct 18, 2022

1.221012.2

Oct 13, 2022

1.221012.1

Oct 12, 2022

1.221011.1

Oct 12, 2022

1.221010.1

Oct 10, 2022

1.221006.1

Oct 6, 2022

1.221005.1

Oct 6, 2022

1.220930.1

Sep 30, 2022

1.220929.12

Sep 30, 2022

1.220929.11

Sep 29, 2022

1.220929.10

Sep 29, 2022

1.220929.9

Sep 29, 2022

1.220929.8

Sep 29, 2022

1.220929.7

Sep 29, 2022

1.220929.6

Sep 29, 2022

1.220929.5

Sep 29, 2022

1.220929.4

Sep 29, 2022

1.220929.3

Sep 29, 2022

1.220929.1

Sep 29, 2022

1.220929

Sep 29, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

azdsdr-1.230110.1.tar.gz (18.6 kB view hashes)

Uploaded Jan 11, 2023 Source

Hashes for azdsdr-1.230110.1.tar.gz

Hashes for azdsdr-1.230110.1.tar.gz
Algorithm	Hash digest
SHA256	`308e4a84aa6b2161364fa330a23c9d25f2b1a2f4e0e5005e1c89f9ce3487768c`
MD5	`85fb986a6a17f9aa4bd49f37cb59450a`
BLAKE2b-256	`1fb533116821222ddaed9ab0d45d4338081ac413e61206a7572882324894f63c`

azdsdr 1.230110.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Andrew's all-in-one data reader - AZDSDR

Installation

Use Kusto Reader

Azure CLI Authentication

Run any Kusto query

Show Kusto tables

Create an empty Kusto table from a CSV file

Upload data to Kusto

Use Dremimo Reader

Step 1. Install Dremio Connector

Step 2. Generate a Personal Access Token(PAT)

Step 3. Configure driver

Dremio Sample Query

Move data with functions from Pipelines class

Export Kusto data to local csv file

Move Dremio data to Kusto

Data Tools

display_all Display all dataframe rows

Thanks

Update Logs

Jan 10, 2023

Dec 16, 2022

Dec 10, 2022

Dec 6, 2022

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Move data with functions from `Pipelines` class

`display_all` Display all dataframe rows