Standardize and Automate processes

These details have not been verified by PyPI

Project description

Nike CA

This package was developed informally for the Commercial Analytics Team at Nike

Before trying to use this package ensure that you have the proper access (This can be found under the "Usage" Section below)

This is a start to see about developing package to facilitate, standardize, and automate repetitive tasks

Installation and Setup

See Information about Installation, Setup, and Running

Dependencies that will automatically be installed if not already satisfied:

"wheel",
"asn1crypto==1.5.1",
"certifi==2022.12.7",
"cffi==1.15.1",
"charset-normalizer==2.1.1",
"cryptography==39.0.1",
"databricks==0.2",
"databricks-sql==1.0.0",
"databricks-sql-connector==2.2.1",
"filelock==3.9.0",
"gitdb==4.0.10",
"GitPython==3.1.31",
"greenlet==2.0.2",
"idna==3.4",
"jupyter-contrib-core==0.4.2",
"jupyter-contrib-nbextensions==0.7.0",
"jupyter-events==0.6.3",
"jupyter-highlight-selected-word==0.2.0",
"jupyter-nbextensions-configurator==0.6.1",
"jupyter-ydoc==0.2.2",
"jupyter_client==8.0.3",
"jupyter_core==5.2.0",
"jupyter_server==2.3.0",
"jupyter_server_fileid==0.8.0",
"jupyter_server_terminals==0.4.4",
"jupyter_server_ydoc==0.6.1",
"jupyterlab==3.6.1",
"jupyterlab-pygments==0.2.2",
"jupyterlab_server==2.19.0",
"lz4==4.3.2",
"numpy==1.23.4",
"oauthlib==3.2.2",
"oscrypto==1.3.0",
"pandas==1.5.3",
"pyarrow==10.0.1",
"pycparser==2.21",
"pycryptodomex==3.17",
"PyJWT==2.6.0",
"pyOpenSSL==23.0.0",
"pystache==0.6.0",
"python-dateutil==2.8.2",
"pytz==2022.7.1",
"requests==2.28.2",
"six==1.16.0",
"smmap==5.0.0",
"snowflake-connector-python==3.0.0",
"snowflake-sqlalchemy==1.4.6",
"SQLAlchemy==1.4.46",
"thrift==0.16.0",
"typing_extensions==4.5.0",
"urllib3==1.26.14",
"xcrun==0.4",
"configparser~=5.3.0"

Installing and Setting up a New Environment (if you are new to python start here):

Installation and Setup with a New Environment

For Mac

Note: This assumes that you already have Python 3.11.2 installed

How do I tell which version of Python I have?

Launch the Terminal by typing "Terminal" in the Launchpad search field or Spotlight
Enter the following command in the Terminal

python3 --version

and you should see this:

Python 3.11.2

To Install Python 3.11.2

Go to https://www.python.org/downloads/
Click on "Download Python 3.11.2"
Open the file and click through the installation steps accepting the defaults

When running for the first time, open the Terminal and run the following commands where you want the files to be kept:

python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install NikeCA

After running the command above, restart the terminal and proceed to the "To open Jupyter Notebook after installation (Mac)"

Installing without Setting up a New Environment:

pip Install Without Setting up a New Environment

Run the following to install:

$ python pip install NikeCA

To open Jupyter Notebook after installation (Mac)

Navigate to the installation location in the terminal and run the following:

source venv/bin/activate
jupyter notebook

Modules

NikeCA

A Module for interacting with the Databases and doing Analytics

Import

Run the following to import:

import NikeCA

Classes:

Snowflake Class

Snowflake:

Snowflake(username: str, warehouse: str, role: str, database: str = None, schema: str = None, table: str = None, column_name: str = None, col_and_or: str = 'AND', get_ex_val: bool = None, like_flag: bool = False, sample_table: bool = False, sample_val: bool = False, table_sample: dict = None, dtypes_conv = None)

Import:

from NikeCA import Snowflake

Parameters:

username (str): The Snowflake account username
warehouse (str): The Snowflake warehouse to use
role (str): The Snowflake role to use
database (str, optional, default=None): The Snowflake database to use
schema (str, optional, default=None): The Snowflake schema to use
table (str, optional, default=None): The Snowflake table to use
column_name (str, optional, default=None): The name of the column to search
col_and_or (str, optional, default=None): The AND/OR operator to use between search criteria
get_ex_val (bool, optional, default=None): Whether to return exact matches only
like_flag (bool, optional, default=None): Whether to use the LIKE operator for search criteria

Methods:

snowflake_pull() - pulls snowflake data

snowflake_pull(

) -> DataFrame:

Dependencies:

pandas
snowflake.connector

Parameters:

query (str): SQL query to run on Snowflake
- e.g. SELECT * FROM {}
username (str or None, default=None): Nike Snowflake Username
database (str or None, default=None): Name of the Database
warehouse (str or None, default=None): Name of the Warehouse
role (str or None, default=None): Name of the role under which you are running Snowflake
sample_table (bool, optional, Default=False): pull only 500 records from table
sample_val (bool, optional, default=False)
table_sample (dictionary, optional, default=None)
dtypes_conv (any, default=None)

return: pandas.DataFrame

Run the following in python to generate a sample query:

from NikeCA import Snowflake

username = <Your Username>
warehouse = <The Name of the Warehouse>
role = <Name of Your Role>
database = <Name of the Database>

sf =  Snowflake(username=username, warehouse=warehouse, role=role, database=database)

query = 'SELECT TOP 2 * FROM  {}'

print(sf.snowflake_pull(query))

build_search_query() - Builds and returns a search query based on the specified parameters and instance variables

build_search_query(

self, inp_db: str | None = None, schema: str | None = None, table: str | None = None, column_name=None, like_flag: bool = False, col_and_or: str = 'AND'

) -> str

Dependencies - None

Parameters:

inp_db (str or None, optional, default=None): The database name to search in. If not specified, search all databases
schema (str or None, optional, default=None): The schema name to search in. If not specified, search all schemas
table (str or None, optional, default=None): The table name to search for. If not specified, search all tables
column_name(any, optional, default=None): The column name(s) to search for. If not specified, search all columns
- If a list is provided, search for any columns that match any of the names in the list

like_flag (bool, optional, default=False)

If True, uses a SQL LIKE statement to search for columns that contain the specified column name(s)

f"AND column_name like '{column_name}' " if like_flag else where_stmt + f"AND column_name = '{column_name}' "

If False, searches for exact matches to the specified column name(s)

f"AND column_name like '{column_name}' " if like_flag else where_stmt + f"AND column_name = '{column_name}' "

col_and_or (str: optional, default='AND'): If specified and column_name is a list, determines whether to search for columns that match all or any of the names in the list. Must be one of the following values: 'AND', 'and', 'OR', 'or'.

return: string of the SQL Statement

Run the following in python to generate a sample query

from NikeCA import Snowflake

username = <Your Username>
warehouse = <The Name of the Warehouse>
role = <Name of Your Role>
database = <Name of the Database>

sf = Snowflake(username=username, warehouse=warehouse, role=role, database=database)

print(sf.build_search_query(column_name='%***%', like_flag=True))

search_schema() - Search snowflake structure for specific schemas/tables/columns

search_schema(

self, username=None, warehouse=None, database=None, role=None, sample_table: bool = False, sample_val: bool = False, table_sample: dict = None, dtypes_conv=None, schema=None, table=None, column_name=None, col_and_or='and', get_ex_val=False, like_flag=False

)

Notes: Will allow to search for tables/cols/etc. even without knowing the db if database=None

Dependencies

pandas
snowflake.connector

Parameters

username (str or None, default=None): Nike Snowflake Username
database (str or None, default=None): Name of the Database
warehouse (str or None, default=None): Name of the Warehouse
role (str or None, default=None): Name of the role under which you are running Snowflake
sample_table (bool, optional, Default=False): pull only 500 records from table
sample_val (bool, optional, default=False)

table_sample (dictionary, optional, default=None)

Notes: The below code is built within the Module

if table_sample is not None: 
     table_sample = {'db': None, 'schema': None, 'table': None, 'col': None}

dtypes_conv (any, default=None)
schema (str, default=None): Snowflake schema name from any database
table (str, default=None): Snowflake table name
column_name (str, default=None): column name to filter
col_and_or (str, default='and'): either 'and' or 'or'
- will use in the where statement
get_ex_val (bool, default=False)
like_flag (bool, default=False): This signifies whether the "column_name like " or "column_name = "

return: pandas.Dataframe

Run the following in python to generate a sample table:

from NikeCA import Snowflake

sf = Snowflake(username=<your username>, warehouse=<your warehouse>, 
     role=<your role>, database=<database you would like to search or none>)

sf.column_name = '*****'
sf.schema = '*****'
sf.like_flag = True

print(sf.search_schema())

snowflake_dependencies() - Searches the snowflake database and finds instances where the table is referenced and where the reference is not in the actual creation of the table itself

snowflake_dependencies(

self, tables: str | list, username: str, warehouse: str, role: str, database: str | None = None, schema: str | list | None = None

) -> pandas.DataFrame:

Note: If the table's get_ddl() is empty, it will throw an error - I will fix this soon

Dependencies

pandas
snowflake.connector

Parameters

tables (list | str, required): This is a list or string to check for in the database could be a table name or anything contained within the get_ddl() string
username (str, default=self): Username for Snowflake
warehouse (str, default=self): Name of the Snowflake warehouse
role (str, default=self): Role for Snowflake
database (str, required, default=self): database to search in
schema (str | list | None, optional, default=self): Snowflake schema to search in
- notes: filling this in can really speed up the query

return: pandas.Dataframe

Run the following in python to generate a sample table:

import pandas as pd

username = 
warehouse =
role = 
database = 

sf = Snowflake(username=username, warehouse=warehouse, role=role, database=database)

tables = ['***', '***']
 
schema = '***'

df = sf.snowflake_dependencies(tables='***', schema=schema)

print(df)

QA Class

QA:

Import

Run the following to import:

from NikeCA import QA

Parameters

df (DataFrame)
df2 (DataFrame, optional, default=None)
ds1_nm (str, optional, default='Source #1')
ds2_nm (str, optional, default='Source #2')
case_sens (bool, optional, default=True)
print_analysis (bool, optional, default=True)
check_match_by (any, optional, default=None)
breakdown_grain (any, optional, default=None)

Methods

column_gap_analysis() - Compares 2 DataFrames and gives shape, size, matching columns, non-matching columns, coverages, and percentages

column_gap_analysis(

self, df2: pd.DataFrame = None, ds1_nm: str = 'Source #1', ds2_nm: str = 'Source #2', case_sens: bool = True, print_analysis: bool = True, check_match_by=None, breakdown_grain=None, df=None

)

Dependecnies

"pandas==1.5.3",

Parameters

df (DataFrame)
df2 (DataFrame, optional, default=None)
ds1_nm (str, optional, default='Source #1')
ds2_nm (str, optional, default='Source #2')
case_sens (bool, optional, default=True)
print_analysis (bool, optional, default=True)
check_match_by (any, optional, default=None)
breakdown_grain (any, optional, default=None)

return: pandas.DataFrame

Run the following in python to generate a sample query

from NikeCA import QA, Snowflake

username = <Your Username>
warehouse = <The Name of the Warehouse>
role = <Name of Your Role>
database = <Name of the Database>

sf = Snowflake(username=username, warehouse=warehouse, role=role, database=database)

df = sf.snowflake_pull(sf.build_search_query(column_name='%***%', like_flag=True))[['TABLE_CATALOG', 'TABLE_SCHEMA', 'COLUMN_NAME']]

df2 = sf.snowflake_pull(sf.build_search_query(column_name='%***%', schema='***', like_flag=True))

qa = QA(df=df, df2=df2)
print(qa.column_gap_analysis())

data_prfl_analysis() - Takes a pandas.DataFrame as an input and returns a pandas.DataFrame with certain inormation about the dataframes, such as a list of columns and data types, nulls, coverage percentage, unique values, etc.

data_prfl_analysis(

self, df: pd.DataFrame = None, ds_name: str = 'Data Source', sample_vals: int = 5, print_analysis: bool = True, show_pct_fmt: bool = True

)

Still Under Development

Dependencies

"pandas==1.5.3",

Parameters

df (DataFrame): pandas.DataFrame to be analyzed
ds_name (str, optional, default='Data Source'): name of the data source to be included in the output
sample_vals (int, optional, default=5)
print_analysis (bool, optional, default=True)
show_pct_fmt (bool, optional, default=True): show_percentage_format

return:

pandas.Dataframe with the following columns:

'DATA_SOURCE'
'COLUMN'
'COL_DATA_TYPE'
'TOTAL_ROWS'
'ROW_DTYPE_CT'
'PRIMARY_DTYPE_PCT'
'COVERAGE_PCT', 'NULL_PCT'
'DTYPE_ERROR_FLAG'
'NON_NULL_ROWS'
'NULL_VALUES'
'UNIQUE_VALUES'
'COL_VALUE_SAMPLE'
'NULL_VALUE_SAMPLE'

    from NikeCA import Snowflake, QA
    
    sf = Snowflake(username=<username>, warehouse=<warehouse>, role=<role>, database=<database>)
    
    df = sf.snowflake_pull("""SELECT TOP 200 * FROM ***""")
    
    print(QA(df).data_prfl_analysis())

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.1 yanked

Apr 10, 2023

0.1.0 yanked

Apr 5, 2023

0.0.33

Sep 13, 2023

0.0.32

Sep 13, 2023

0.0.31

Aug 31, 2023

0.0.30

Aug 31, 2023

0.0.29

Aug 31, 2023

0.0.28

Jul 12, 2023

0.0.27

Jul 12, 2023

0.0.26

Jul 11, 2023

0.0.25

Jul 11, 2023

0.0.24

Jul 11, 2023

0.0.23

Jun 27, 2023

0.0.22

Jun 26, 2023

0.0.21

Jun 13, 2023

0.0.20

Jun 13, 2023

0.0.19

Jun 8, 2023

0.0.18

Jun 7, 2023

0.0.17

Jun 1, 2023

0.0.16

Jun 1, 2023

0.0.15

May 22, 2023

0.0.14

May 22, 2023

This version

0.0.13

May 22, 2023

0.0.13a3 pre-release

May 22, 2023

0.0.13a2 pre-release

May 22, 2023

0.0.13a1 pre-release

May 22, 2023

0.0.13a0 pre-release

May 22, 2023

0.0.12

May 22, 2023

0.0.11

May 19, 2023

0.0.10

May 19, 2023

0.0.9

May 19, 2023

0.0.8

May 19, 2023

0.0.7

May 19, 2023

0.0.6 yanked

May 19, 2023

0.0.5

May 19, 2023

0.0.4

May 19, 2023

0.0.3

May 19, 2023

0.0.2

May 19, 2023

0.0.2b0 pre-release

May 19, 2023

0.0.2a0 pre-release

May 19, 2023

0.0.1

May 19, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

CADPR-0.0.13.tar.gz (34.0 kB view details)

Uploaded May 22, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

CADPR-0.0.13-py3-none-any.whl (137.1 kB view details)

Uploaded May 22, 2023 Python 3

File details

Details for the file CADPR-0.0.13.tar.gz.

File metadata

Download URL: CADPR-0.0.13.tar.gz
Upload date: May 22, 2023
Size: 34.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for CADPR-0.0.13.tar.gz
Algorithm	Hash digest
SHA256	`21d06029ccb9a607b49e18ad09ac67b0c0cab773b93c6db6635860b8c9daaa5c`
MD5	`aeccbd66de5666794b4fc79b18bfdc26`
BLAKE2b-256	`5734553fdcbbac38f10dedb4854ea7aabef818a8492421b650073dc76f827a4f`

See more details on using hashes here.

File details

Details for the file CADPR-0.0.13-py3-none-any.whl.

File metadata

Download URL: CADPR-0.0.13-py3-none-any.whl
Upload date: May 22, 2023
Size: 137.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for CADPR-0.0.13-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b90525311732e16a529e01b32f2e3816a1d3fb0b22e092aad608f487fa86f5ab`
MD5	`cea5509d61de0ea8101faaf1bbe84921`
BLAKE2b-256	`af1550e2acd1157012c2b6b58d91390504dafbd6e3a3093e0e756380035410fe`

See more details on using hashes here.

CADPR 0.0.13

Navigation

Verified details

Maintainers

Unverified details

Classifiers

Project description

Nike CA

Installation and Setup

Installing and Setting up a New Environment (if you are new to python start here):

Note: This assumes that you already have Python 3.11.2 installed

Installing without Setting up a New Environment:

To open Jupyter Notebook after installation (Mac)

Modules

NikeCA

Classes:

Snowflake:

Methods:

snowflake_pull(

return: pandas.DataFrame

build_search_query(

Dependencies - None

return: string of the SQL Statement

Run the following in python to generate a sample query

search_schema(

return: pandas.Dataframe

snowflake_dependencies(

return: pandas.Dataframe

QA:

Import

Methods

column_gap_analysis(

return: pandas.DataFrame

Run the following in python to generate a sample query

data_prfl_analysis(

Still Under Development

return:

Project details

Verified details

Maintainers

Unverified details

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes