Extra Useful Utilities.
Project description
xutil
This is a Python package containing all the utility functions and libraries that are commonly used.
Install
pip install xutil
pip install xutil[jdbc] # for JDBC connectivity. Requires JPype1.
pip install xutil[web] # for web scraping. Requires Twisted.
pip install xutil[hive] # for Hive connectivity. Requires SASL libraries.
Windows
If you face the message ‘error: Microsoft Visual C++ 14.0 is required. Get it with “Microsoft Visual C++ Build Tools”’, you can quickly install the build tools with chocolatey (https://chocolatey.org/)
choco install -y VisualCppBuildTools
CLI
Available commands:
xutil-alias # add useful alias commands, see xutil/alias.sh
xutil-create-profile # creates ~/profile.yaml from template.
exec-etl --help # Execute various ETL operations.
exec-sql --help # Execute SQL from command line
ipy # launch ipython with pre-defined modules/functions imported
ipy-spark --help # launch ipython Spark with pre-defined modules/functions imported
pykill pattern # will swiftly kill any process with the command string mathing pattern
Databases
Why not use SQLAlchemy (SA)? http://docs.sqlalchemy.org/en/latest/faq/performance.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow
It has been demontrated the SA is not performant when it comes to speedy ETL.
SQL Server
Installation
Make sure ODBC is installed.
brew install unixodbc
apt-get install unixodbc
Then, install the drivers
odbcinst -j
Oracle
Install Oracle Client:
brew tap InstantClientTap/instantclient
brew install instantclient-basic
Installing with conda:
conda install oracle-instantclient -y
Spark SQL
It is the user’s responsibility to properly set up the SPARK_HOME environment and configurations. This library uses pyspark and will default to the SPARK_HOME settings.
Useful config.py
https://github.com/apache/incubator-airflow/blob/master/setup.py
Dev
pip install -e /path/to/xutil
Testing
python setup.py test
Release
Update version in setup.py.
Draft new release on Github: https://github.com/flarco/xutil/releases/new
git clone https://github.com/flarco/xutil.git
cd xutil
m2r --overwrite README.md
python setup.py sdist && twine upload --skip-existing dist/*
TODO
Revamp database.base methods:
get_conn
DBConn
__init__
_set_variables
_do_execute
_split_schema_table
_concat_fields
_template
connect
check_pk
execute -- straight SA.connection.execute, return "fields, rows"
query -- use the SQLAlachy and replaces self.select, fields = conn._fields"
stream
insert
drop_table
create_table
get_cursor_fields -> _get_cursor_fields
get_schemas
get_objects
get_tables
get_views
get_columns
get_primary_keys
get_indexes
get_ddl
get_all_columns
get_all_tables
analyze_fields
analyze_tables
analyze_join_match
remove:
get_cursor: no need for get_cursor with SA
execute_multi
select: use `query` instead, which uses `execute`
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.