Extra Useful Utilities.
Project description
xutil
This is a Python package containing all the utility functions and libraries that are commonly used.
Install
pip install xutil
pip install xutil[jdbc] # for JDBC connectivity. Requires JPype1.
pip install xutil[web] # for web scraping. Requires Twisted.
pip install xutil[hive] # for Hive connectivity. Requires SASL libraries.
Windows
If you face the message ‘error: Microsoft Visual C++ 14.0 is required. Get it with “Microsoft Visual C++ Build Tools”’, you can quickly install the build tools with chocolatey (https://chocolatey.org/)
choco install -y VisualCppBuildTools
CLI
Available commands:
xutil-alias # add useful alias commands, see xutil/alias.sh
xutil-create-profile # creates ~/profile.yaml from template.
exec-etl --help # Execute various ETL operations.
exec-sql --help # Execute SQL from command line
ipy # launch ipython with pre-defined modules/functions imported
ipy-spark --help # launch ipython Spark with pre-defined modules/functions imported
pykill pattern # will swiftly kill any process with the command string mathing pattern
Databases
Why not use SQLAlchemy (SA)? http://docs.sqlalchemy.org/en/latest/faq/performance.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow
It has been demontrated the SA is not performant when it comes to speedy ETL.
SQL Server
Installation
Make sure ODBC is installed.
brew install unixodbc
apt-get install unixodbc
Then, install the drivers
odbcinst -j
Oracle
Install Oracle Client:
brew tap InstantClientTap/instantclient
brew install instantclient-basic
Installing with conda:
conda install oracle-instantclient -y
Spark SQL
It is the user’s responsibility to properly set up the SPARK_HOME environment and configurations. This library uses pyspark and will default to the SPARK_HOME settings.
Useful config.py
https://github.com/apache/incubator-airflow/blob/master/setup.py
Dev
pip install -e /path/to/xutil
Testing
python setup.py test
Release
Update version in setup.py.
Draft new release on Github: https://github.com/flarco/xutil/releases/new
git clone https://github.com/flarco/xutil.git
cd xutil
m2r --overwrite README.md
python setup.py sdist && twine upload --skip-existing dist/*
TODO
Revamp database.base methods:
get_conn
DBConn
__init__
_set_variables
_do_execute
_split_schema_table
_concat_fields
_template
connect
check_pk
execute -- straight SA.connection.execute, return "fields, rows"
query -- use the SQLAlachy and replaces self.select, fields = conn._fields"
stream
insert
drop_table
create_table
get_cursor_fields -> _get_cursor_fields
get_schemas
get_objects
get_tables
get_views
get_columns
get_primary_keys
get_indexes
get_ddl
get_all_columns
get_all_tables
analyze_fields
analyze_tables
analyze_join_match
remove:
get_cursor: no need for get_cursor with SA
execute_multi
select: use `query` instead, which uses `execute`
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file xutil-0.2.3.tar.gz
.
File metadata
- Download URL: xutil-0.2.3.tar.gz
- Upload date:
- Size: 12.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 030272f343837745440fdace29a7bccd166faea1fbd636e1ec1ee4c76c5442ee |
|
MD5 | b654cd117193ae9c95f63b8b382ce4e8 |
|
BLAKE2b-256 | ca1cbfc4e6994eaf76e66eb3727b16b0c5458baa965d9ef7b6a188353116eb53 |