A Powerful suite of fast and intuitive Python ETL utilities
Project description
pyblux: A suite of fast, easy-to-use, and intuitive Python ETL utilities.
What is it?
pyblux is a Python package that provides a suite of ETL utilities built to make the interactions with databases in the cloud as well as on-premise fast, easy and intuitive.
Features
- Support multiple databases, including Postgres, MySql, MS SQL, SQLIte, Teradata and Oracle.
- The
get_engine
method makes it easy to connect to databases in a simple and intuitive manner. Blux.sql
method from theBlux
class helps run fast queries. It Povides output results as namedtuple or dictionary and it supports parameterised queries and in-flight transformation of data.Logger
class helps setup logging via log file or console.send_teams_notification
method provides an easy way to send alerts to a MS Teams channel via an incoming webhook.send_email
method helps to send email with HTML content- Helpful error messages display the failed query SQL
- DBAPI2 specification was used in order to simplify coding for queries on relational database systems using Python.
Where to get it
The source code is currently hosted on GitHub at: https://github.com/bertin.nono/pyblux
Binary installers for the latest released version are available at the Python Package Index (PyPI)
# or PyPI
pip install pyblux
pyblux provides support for the databases below:
Dependencies
Depending on the use case, the database package should be installed.
- Teradata : Install the Teradata SQL Driver Dialect for SQLAlchemy
pip install teradatasqlalchemy
- PostgreSQL: Install Psycopg which is the most popular PostgreSQL database adapter for the Python programming language.
pip install psycopg2-binary
- MySQL and MariaDB: Install the Pure Python MySQL Driver
pip install PyMySQL
- [SQLite]: No install required
- Oracle: Intall cx_Oracle which is a Python extension module that enables access to Oracle Database.
pip install cx-Oracle
- Microsoft SQL Server: Install the pyodbc which is an open source Python module that makes accessing ODBC databases simple.
pip install pyodbc
Documentation
Classes
-
Blux:
Establishes a connection engine to a database system referenced by the dialect attribute run fast queries.
Note: The dialect
is the system SQLAlchemy uses to communicate with various types of DBAPI implementations and databases.
class Blux:
"""
This class connects to a local database session using the db `dialect` library.
"""
def __init__(self, engine=None,dialect=None):
"""
Args:
engine (str): Database connection engine.
dialect (str): database system name(postgres, oracle, teradata,...)
Note: Database Connection package must be installed in order to use this backend.
"""
self.engine = engine
self._dialect = dialect
self.__errlimit = 1
self.__warnings = []
self.__errors = []
self.__logons = []
Example:
table_list = """SELECT table_schema, table_name FROM information_schema.tables"""
postgres_engine = get_engine(dialect='PG', host="localhost", port=5432, database="mydata", user="postgres_user", password="123")
blux= Blux(engine=postgres_engine, dialect='postgres')
dataframe = blux.sql(query=table_list)
table = 'test'
database = 'postgres'
#load dataframe to table
blux.sql(dataframe=dataframe,database=database,table=table, dialect='postgres')
-
Logger:
provides a custom logging handler called logger
. Helps Debug SQL and monitor progress with logging.
class Logger:
"""
This class connects to a local database session using the db `connnection` library.
"""
def __init__(self, logname:str, filename:str, level=logging.INFO, console:bool=True):
"""
Args:
logname (str): Logger Name.
filename (str): log file path
level (str): Logger Level (DEBUG, INFO, WARNING, ERROR)
console (cool): print to console
"""
self._logname = logname
self._filename = filename
self._level = level
self._console = console
Example:
import logging
from pyblux import logger
pyblux_logger = Logger(logname=ETL.NAME, filename=log_file,level=logging.INFO, console=True)
logger=pyblux_logger.logger( verbose=True)
Output from a call for get_engine
will look like:
2021-07-07 15:06:22,411 get_engine:
2021-07-07 15:06:22,413 get_engine:
2021-07-07 15:06:22,416 get_engine:
Methods:
-
get_engine:
Creates a database connection engine.
get_engine(user:str,password:str,host:str,port:int,database:str,dialect:str,verbose:bool=False,parameter:str=None,raw_engine:bool=True,logger:Callable=print)
Database connection details are defined by get_engine
objects (see below).
Example:
import pandas
from pyblux.utils import get_engine
from pyblux.blux import Blux
oracle_engine = get_engine(dialect='oracle', host="localhost", port=1521,database="mydata", user="oracle_user", password="123")
teradata_engine = get_engine(dialect='mssql', host="localhost", port=5432, database="mydata", user="postgres_user", password="123")
postgres_engine = get_engine(dialect='mysql', host="localhost", port=5432, database="mydata", user="postgres_user", password="123")
postgres_engine = get_engine(dialect='teradata', host="localhost", port=5432, database="mydata", user="postgres_user", password="123")
postgres_engine = get_engine(dialect='postgres', host="localhost", port=5432, database="mydata", user="postgres_user", password="123")
Passwords
It is best practice for Database passwords to be stored in environment variables. This can be done on the command line via:
export password=secret-password
on Linuxset password=secret-password
on Windows
Or in a Python terminal via:
import os
os.environ['password'] = 'secret-password'
No password is required for SQLite databases.
When URL that includes the password contains
Password conatains special characters
Example: Connection_String = postgresql+psycopg2://user:p@ssword%to%encode@hosturl/defaultdb The above password encoded using urllib.parse:
pwd=urllib.parse.quote_plus("p@ssword%to%encode")
print(pwd)
Connection_String = postgresql+psycopg2://user:urllib.parse.quote_plus("p@ssword%to%encode")@hosturl/defaultdb
-
get_connection:
Gets a regular database connection.
get_connection(user:str,password:str,host:str,port:int,database:str,dialect:str,verbose:bool=False,parameter:str=None,logger:Callable=print):
"""
Get a regular connection for Teradata , Oracle, Aurora/Postgres, Aurora/MySql/MariaDB, SQLite, and Microsoft SQL Server
Returns
-------
Connection Object
"""
Example:
from pyblux.utils import get_connection
oracle_conn = get_connection(dialect='oracle', host="localhost", port=1521,database="mydata", user="oracle_user", password="123")
teradata_conn = get_connection(dialect='mssql', host="localhost", port=5432, database="mydata", user="postgres_user", password="123")
postgres_conn = get_connection(dialect='mysql', host="localhost", port=5432, database="mydata", user="postgres_user", password="123")
postgres_conn = get_connection(dialect='teradata', host="localhost", port=5432, database="mydata", user="postgres_user", password="123")
postgres_conn = get_connection(dialect='postgres', host="localhost", port=5432, database="mydata", user="postgres_user", password="123")
-
create_table_from_dataframe:
Creates table from a dataframe attributes and fastload data into in it.
create_table_from_dataframe(dataframe:pd.DataFrame=None,table:str=None,Blux:Blux=None,verbose:bool=False,logger:Callable=print)
-
is_exist:
Checks is a table or view exist.
is_exist(table:str='', Blux:Blux=None,verbose:bool=False,logger:Callable=print)
-
drop_table:
Checks is a table or view exist and then drops it if it exists.
drop_table(table:str=None,Blux:Blux=None,verbose:bool=False,logger:Callable=print)
-
send_teams_notifications:
Send a Card to a MS Teams Channel
send_teams_notification ( hookurl: str, title: str='' , text: str='', message: str ='', status: str ='', error_message: str='', activitySubtitle: str='', activityText: str='')
-
send_email:
Send an HTML formated email that can include a dataframe
send_email(server:str, port:int,sender: str, receivers: list, subject: str, body_text: str, attachment: any = None,df: pd.DataFrame = None)
Maintainers:
- Bertin Nono
Development status
Stable
Licence
MIT
References
Change Log
(09/12/2021)
- Initial Release
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pyblux-0.0.5.tar.gz
.
File metadata
- Download URL: pyblux-0.0.5.tar.gz
- Upload date:
- Size: 14.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/3.10.0 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
3480e1d564f23211cb63049ef033aeaaac13ef493ac5853dce1aec81fc5a8fba
|
|
MD5 |
6c83f37a1d6c627467c59f0e7c7d76c0
|
|
BLAKE2b-256 |
60f5edba9577d0826a5042819f73c1daddaf38f4ca7e4e7a4f40286733e0a79a
|