Kraken is a convenience package that orchestrates data extraction by integrating SQL with Python, while managing safe storage and recall of sensitive connection credentials.
Reason this release was yanked:
array_size introduction affects ordering of SQL execution
Project description
Kraken
Kraken is a convenience package that orchestrates data extraction by integrating SQL with Python, while managing safe storage and recall of sensitive connection credentials.
Developed in the NHS at University Hospital Southampton to facilitate clinical research in a complex multi-database environment, Kraken provides streamlined management of Reproducible Analytical Pipelines (RAPs) from multiple data sources by automating extraction, parsing, connection and execution of SQL files. Offering multiple levels of control, Kraken can run an entire SQL data pipeline from extraction to export with as little as one line of Python code, or provide more fine-tuned management of entire ETL flows through use of underlying connector objects. Kraken also provides standardised statistical summaries and graphing for speedy interrogation of results.
Wrapping around packages like keyring, sqlalchemy, pyodbc, pandas, matplotlib and seaborn, Kraken:
- Safely stores sensitive database connection credentials under an 'alias' in your operating system's credential store, and recalls them on demand and as dictated by SQL files;
- Extracts all SQL from a targeted file or folder of files (sequentially or simultaneously) - parsing and splitting them into queries before executing them in order;
- Returns DataFrame results that include provenance metadata for auditable tracking;
- Allows database upload of data with fine-tuned control;
- Integrates extraction of local data files;
- Provides fast data interrogation with single-line graphing and statistical summary functions.
Guides
Quickstart Example
pip install datakraken
import kraken
# Set default database alias (once)
kraken.set_default_alias("TEST")
# Save test connection (once)
kraken.save_connection_MSSQL(alias="TEST", server="xxx", database="xxx", username=None, password=None, default=True)
# Execute SQL file in folder
results = kraken.run(filepaths='sql_folder')
# Examine summary statistics
for result in results:
result.examine()
Function Quick List
Connection
save_connection()- save database service credentials to OS's credential manager under an alias for recallsave_connection_XXX()- convenience wrapper aroundsave_connection()for various supported database platformsexecute()- execute SQL querycreate_connector()- create Connector object to reuse for SQL execution
SQL Execution
extract_sql()- extract & parse SQL files from provided filepath, preparing collection of queriesexecute_sql()- execute collection of queries fromextract_sql()export()- export results to various filetypesrun()- wrapper around above functions, allowing extraction, execution, and optional export results from SQL files
Data Import
extract_spreadsheets()- extract data from spreadsheets
Data Upload
upload()- upload single DataFrame to database (note that this always commits)upload_results()- upload collection of DataFrames to database (note that this always commits)
Analysis
examine()- analyse DataFrame & provide high-level statistical summarygraph()- graph DataFrame quickly with support for multiple graph types
Helpers
generate_where_clause()- loop over a DataFrame to convert rows or columns into batches of WHERE clausesdatestamp()- get current time or datestamp as a string with formatting optionscalculate_runtime()- get timedelta and formatted string between two datetime valuesreadout.activate()/suppress()- turn readout on or offencode()/decode()- convenient wrappers around keyring to store/recall secrets
Class Object Quick List
Connector(usecreate_connector) - object allowing fine-tuned SQL connection & execution controlParser- object allowing SQL parsing (as used inextract_sql())Progress- heavily customisable progress bar/ticker with threaded auto-refresh and context management
Supported Database Platforms
These database platforms are currently supported:
- Oracle
- Microsoft SQL Server
- PostgreSQL
- MySQL
- MariaDB
- IBM Informix
- Intersystems Cache
- Intersystems Iris
Future Development
What we're working on next:
- Integration with BitWarden for credential storage
- Direct import/export to Microsoft SharePoint
- Pipeline execution logging and run reports
- Expanded database platform support
Contributing
Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
License
This work is licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License.
Authors
Kraken was developed within the NHS by Cai Davis, Michael George, and Faizan Hemotra at University Hospital Southampton NHSFT, as part of the Southampton Emerging Therapies and Technology (SETT) Centre.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datakraken-1.2.3.tar.gz.
File metadata
- Download URL: datakraken-1.2.3.tar.gz
- Upload date:
- Size: 67.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2935f20cad6e21d62d23797ca2b22b65db03aa602796ea002f75aeb6fc218f46
|
|
| MD5 |
73e57e9597666723dbb5a508bbb2ec10
|
|
| BLAKE2b-256 |
26542370c3fa0a44742efbaa75d94ea0fa089fcf8dac8f9d2d786a7149428eb4
|
File details
Details for the file datakraken-1.2.3-py3-none-any.whl.
File metadata
- Download URL: datakraken-1.2.3-py3-none-any.whl
- Upload date:
- Size: 88.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc4ce4e9a66d548e4043a856d0505c02a316fb88cbca165c183a83cfe3dc7049
|
|
| MD5 |
34beb8e14821e0be79450d0cb65acba2
|
|
| BLAKE2b-256 |
d00b80b50ed4374d8bf57642b076bba95c171a467b58c5dc16d884179fbedbca
|