Hive Runner is a python script that pulls saved queries from Beeswax, runs the queries on Hive, and stores the results in Memcache.
Project description
Hive Runner is a python script that pulls saved queries from Beeswax, runs the queries on Hive, and stores the results in Memcache.
Using Hive Runner
Requirements
Python 2.7
Cloudera Beeswax - Beeswax must be using a MySQL Database for storage.
HiveServer - You must be running HiveServer version 1. Note that Cloudera’s Hadoop distribution only ships with version 2. You can easily install version 1 using Cloudera’s package repositories.
Memcached - You must have Memcached running somewhere.
Pip - Pip is used for Python package dependency.
Installation
Optionally, create a VirtualEnv: virtualenv environment-name
Optionally, use your VirtualEnv: source environment-name/bin/activate
Install Hive Runner via pip: pip install hiverunner
Usage
Hive Runner has flexible parameters. Available options can be seen by running hiverunner --help. The most important parameters to include when running Hive Runner from the command line are connection settings.
For example, to run all queries in Beeswax prepended with _hourly and caching the results in memcache:
hiverunner --hourly \ --mysql-host mysql01.example.com \ --mysql-database beeswax \ --mysql-user hue \ --mysql-password secret \ --hive-host hive01.example.com \ --memcache-host cache01.example.com
You can run the same command for all queries prepended with _weekly simply by changing the hourly parameter to weekly:
hiverunner --weekly \ --mysql-host mysql01.example.com \ --mysql-database beeswax \ --mysql-user hue \ --mysql-password secret \ --hive-host hive01.example.com \ --memcache-host cache01.example.com
If you find that you need to run custom named queries or only a single query the custom parameter makes this easy. Simply provide the name of the query that must be run.
For example, to run a single query regardless of the prepended time-focused demarcation:
hiverunner --custom _daily_custom_query \ --mysql-host mysql01.example.com \ --mysql-database beeswax \ --mysql-user hue \ --mysql-password secret \ --hive-host hive01.example.com \ --memcache-host cache01.example.com
This format makes it easy to schedule cron jobs.
More Information
Hive Runner is open source software and available at https://github.com/bellycard/hiverunner. Bug reports, feature requests, and contributions are welcome!
Contributors
AJ Self <aj@bellycard.com>
Kevin Reedy <kevin@bellycard.com>
License
Apache License, Version 2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file hiverunner-1.0.1.tar.gz
.
File metadata
- Download URL: hiverunner-1.0.1.tar.gz
- Upload date:
- Size: 8.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c2293192f860967563c7d2950baaf48b322263cfc29cdcb0fc5bccb9e0a5110e |
|
MD5 | e71fc8a624cfabe8aab8315dba61dabb |
|
BLAKE2b-256 | 293e1c3ec7be3ec778dec825f2e777e8dbc653ed9b0f3758476c3556d43c1601 |