Extra Useful Utilities.
This will be a Python package containing all the utility functions and libraries that are commonly used.
This link displayed how to do this back with Youtube.
pip install -U git+https://github.com/flarco/xutil.git pip install -U git+https://github.com/flarco/xutil.git#egg=xutil[jdbc] # for JDBC connectivity. Requires JPype1. pip install -U git+https://github.com/flarco/xutil.git#egg=xutil[web] # for web scraping. Requires Twisted. pip install -U git+https://github.com/flarco/xutil.git#egg=xutil[hive] # for Hive connectivity. Requires SASL libraries.
If you face the message ‘error: Microsoft Visual C++ 14.0 is required. Get it with “Microsoft Visual C++ Build Tools”’, you can quickly install the build tools with chocolatey (https://chocolatey.org/)
choco install -y VisualCppBuildTools
xutil-alias # add useful alias commands, see xutil/alias.sh xutil-create-profile # creates ~/profile.yaml from template. exec-etl --help # Execute various ETL operations. exec-sql --help # Execute SQL from command line ipy # launch ipython with pre-defined modules/functions imported ipy-spark --help # launch ipython Spark with pre-defined modules/functions imported pykill pattern # will swiftly kill any process with the command string mathing pattern
It has been demontrated the SA is not performant when it comes to speedy ETL.
Make sure ODBC is installed.
brew install unixodbc apt-get install unixodbc
Then, install the drivers
It is the user’s responsibility to properly set up the SPARK_HOME environment and configurations. This library uses pyspark and will default to the SPARK_HOME settings.
pip install -e /path/to/xutil
Update version in setup.py.
Draft new release on Github: https://github.com/flarco/xutil/releases/new
git clone https://github.com/flarco/xutil.git cd xutil python setup.py sdist twine upload dist/*
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.