Python package to extract table names from github url and check whether the tables are stale
Project description
stale_data_detection
Extract table names from GitHub Repositories and check whether the tables are stale.
Requirements
- boa package to read the table into DataFrame. Install at: https://github.aetna.com/analytics-org/boa
- sql-metadata package to extract table names from query.
pip install sql-metadata
Folder Structures:
stale_data_detection
├── test
│ ├── __init__.py
│ └── demo.ipynb
│ └── test_flag_stale_table.py
│ └── test_extract_tables.py
├── stale_data_detection
│ ├── __init__.py
│ ├── extract_table_names.py
│ ├── flag_stale_table.py
│ └── get_table_update_status.py
└── README.md
extract_table_names.py
Get raw download urls of all files given a GitHub repo or subdirectory and files extension (default .hql file). Read file content and extract names of all tables mentioned in the .hql files.
flag_stale_table.py
Read table content into Pandas DataFrame and check whether a table is stale given the table name.
get_table_update_status.py
Identify all columns that likely contain update dates and return the last update date given a table Pandas DataFrame
test_extract_tables.py
Test module in extract_table_names.py.
Usage:
python test_extract_tables.py -u github/repo/url -b branch_name -a access_token -e file_extension
GitHub repo URL and branch name are required. Other arguments are optional.
test_flag_stale_table.py
Test module in flag_stale_table.py.
Usage:
python test_extract_tables.py -u github/repo/url -b branch_name -a access_token -e file_extension -o path/to/output/file
GitHub repo URL and branch name are required. Other arguments are optional.
demo.ipynb
Demo functions in Jupyter notebook.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file stale_data_detection-0.4.tar.gz.
File metadata
- Download URL: stale_data_detection-0.4.tar.gz
- Upload date:
- Size: 8.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.8.2 requests/2.28.1 setuptools/65.6.3 requests-toolbelt/0.9.1 tqdm/4.64.1 CPython/3.7.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
13e95592ed4d2ad1d10e67fe214abe297874093629f0dfdb3a558f715a437b38
|
|
| MD5 |
5c862ae722b88f928f1ffbfbaaf2512f
|
|
| BLAKE2b-256 |
f87f421573e96f4ba9a8480b015a24b1febefe753e052ab9f6ba3e942c81372d
|
File details
Details for the file stale_data_detection-0.4-py3-none-any.whl.
File metadata
- Download URL: stale_data_detection-0.4-py3-none-any.whl
- Upload date:
- Size: 12.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.8.2 requests/2.28.1 setuptools/65.6.3 requests-toolbelt/0.9.1 tqdm/4.64.1 CPython/3.7.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4456bafc550429aaa60cad1e1fdeda2f86d6c597cc7cff2b061fc398e17ba878
|
|
| MD5 |
9587ed694a49f3486947189fc6e9162a
|
|
| BLAKE2b-256 |
85eae1fbbcc87b3f2e7b2e11cd461a48ae306e664fbbd1af8653b9bf045f4221
|