Manage your data science databases
Project description
dsdbmanager
Data Science DataBase Manager
Installation
pip install dsdbmanager
ReadMe
Do you love SqlAlchemy? Do you usually have to connect to the same databases all the time for some quick data processing and/or exploration?
- You might have database address and/or credentials hardcoded in a script
- You might be using environment variables to store credentials or host/ports etc
When dealing with simple data processing (especially with different databases with relatively small tables), it does not always make sense to have the something like
import sqlalchemy
import cx_Oracle
in the header of each script. There are many ways of dealing with these issues but this project tries to provide an example for the new data scientist. This project is not meant to be used in production; the goal is to speed up exploration by eliminating some frequent database connection activities.
- The examples below are based on a locally hosted MySql database
Consider the following database dstest with tables category and user. The database host, schema and/or port can be stored in a json file that the package uses to quickly create sqlalchemy engines.
Adding databases
After (or before) installation of the package, set an envionment variable DSDBMANAGER_CONFIG
pointing to a folder where the configuration files can be stored.
If the environment variable is not available, pathlib.Path.home() / ".dsdbmanager"
is used by default.
Use entry point command dsdbmanager add-database
directly in a command shell to add a database.
Or any python interactive interactive shell:
This will add the database directly to the .host.json
file automatically created at first import
.
Connecting
Manual Connection
Once a database is added, it is easy to connect to it using one of the following modules (each corresponding to a sql flavor/dialect) to establish a connection. This approach means that the user only needs to provide the username and password to create the engines.
- oracle_
- mssql_
- mysql_
- teradata_
Connecting With Shortcut
In order to save encrypted credentials for reuse, the project comes with a shortcut for each flavor/dialect
This approach creates an object that has the name of each database as a method. The image above shows the only mysql database dstest
as a property.
This property is actually a function which must be called to establish connection.
Because the connection is only made when the function is called, all the databases available are just properties until the user needs them. When a connection is attempted for the first time, the user is prompted for the username and password. The credentials are used to create the sqlalchemy engine which is used to test for successful connection. If the connection fails, the credentials are discarded and an error is raised. If the connection succeeds, the credentials are encrypted and stored in a file like this:
{
"mysql": {
"dstest": {
"username": "gAAAAABdYe_Le1rX3W5y23GlLf0dtrVVOWJhaPGVk2_CbIfpcqb_0dzu5_MFJpgTRuXF7EKk3UcLvCI5HyjP6b5daZQoMJRM2g==",
"password": "gAAAAABdYe_LtuwnY95B0nhsSKQbe8DEuvhbjO2Y9zo-PwC_UqsmQ1whRsGyTlZGc3RRyWc3yde6cGozxPJjcjZv77itSuyKVg=="
}
}
}
The credentials are retrieved every other time so that the user never has to pass the host or password info when needed.
Connect_Only = False
What happens when connect_only = False
? The object created has not only an engine but:
- Attribute '_metadata': provides the metadata for each table/view in the schema
- Every table or view in the schema becomes an attribute that can be used to pull data into a dataframe
- Attribute '_insert': provides a function that uses a pandas dataframe to insert records into a table
The picture above shows that
2
records were inserted in thecategory
table ofdstest
and we can see it here - Attribute '_update': provides a function that uses a pandas dataframe to update records in a table
Context Manager
It is possible to connect to the databases with a context manager approach and this is highly suggested. Sqlachemy engines are disposed and properties pointing to tables etc. are all cleared
The context manager approach is way cleaner anyways.
from dsdbmanager import mysql
with mysql().dstest(connect_only=True, schema=None) as dbobject:
# anything with the engine goes here
pass
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dsdbmanager-1.0.6.tar.gz
.
File metadata
- Download URL: dsdbmanager-1.0.6.tar.gz
- Upload date:
- Size: 24.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 06dd7e895174f3439e44940ea5c58e5d38724add390dd359e4714d7bcd8936ee |
|
MD5 | ea341d3e53be9876773b74f18ebc0077 |
|
BLAKE2b-256 | 7ba38e6d9911b98e36b807d725c9cf72293408cd82b3e9f1c33e18f5d33a2511 |
File details
Details for the file dsdbmanager-1.0.6-py3-none-any.whl
.
File metadata
- Download URL: dsdbmanager-1.0.6-py3-none-any.whl
- Upload date:
- Size: 28.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | efae06bac36122f75ad836bfcadf4436a34c4bd15f5c4e13288d966b6da2eb96 |
|
MD5 | c7c79c68a5b28ed9ed0424830dca77d4 |
|
BLAKE2b-256 | e35aa076ab6c6bd664db26bb4ea55ab775c25a7c5c1a2cc79c2e63e6d128b6e3 |