Skip to main content

Hive Runner is a python script that pulls saved queries from Beeswax, runs the queries on Hive, and stores the results in Memcache.

Project description

Hive Runner is a python script that pulls saved queries from Beeswax, runs the queries on Hive, and stores the results in Memcache.

Using Hive Runner

Requirements

  • Python 2.7

  • Cloudera Beeswax - Beeswax must be using a MySQL Database for storage.

  • HiveServer - You must be running HiveServer version 1. Note that Cloudera’s Hadoop distribution only ships with version 2. You can easily install version 1 using Cloudera’s package repositories.

  • Memcached - You must have Memcached running somewhere.

  • Pip - Pip is used for Python package dependency.

Installation

  • Optionally, create a VirtualEnv: virtualenv environment-name

  • Optionally, use your VirtualEnv: source environment-name/bin/activate

  • Install Hive Runner via pip: pip install hiverunner

Usage

Hive Runner has flexible parameters. Available options can be seen by running hiverunner --help. The most important parameters to include when running Hive Runner from the command line are connection settings.

For example, to run all queries in Beeswax prepended with _hourly and caching the results in memcache:

hiverunner --hourly \
--mysql-host mysql01.example.com \
--mysql-database beeswax \
--mysql-user hue \
--mysql-password secret \
--hive-host hive01.example.com \
--memcache-host cache01.example.com

You can run the same command for all queries prepended with _weekly simply by changing the hourly parameter to weekly:

hiverunner --weekly \
--mysql-host mysql01.example.com \
--mysql-database beeswax \
--mysql-user hue \
--mysql-password secret \
--hive-host hive01.example.com \
--memcache-host cache01.example.com

If you find that you need to run custom named queries or only a single query the custom parameter makes this easy. Simply provide the name of the query that must be run.

For example, to run a single query regardless of the prepended time-focused demarcation:

hiverunner --custom _daily_custom_query \
--mysql-host mysql01.example.com \
--mysql-database beeswax \
--mysql-user hue \
--mysql-password secret \
--hive-host hive01.example.com \
--memcache-host cache01.example.com

This format makes it easy to schedule cron jobs.

More Information

Hive Runner is open source software and available at https://github.com/bellycard/hiverunner. Bug reports, feature requests, and contributions are welcome!

Contributors

AJ Self <aj@bellycard.com>

Kevin Reedy <kevin@bellycard.com>

License

Apache License, Version 2.0

http://www.apache.org/licenses/LICENSE-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hiverunner-1.0.1.tar.gz (8.3 kB view details)

Uploaded Source

File details

Details for the file hiverunner-1.0.1.tar.gz.

File metadata

  • Download URL: hiverunner-1.0.1.tar.gz
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for hiverunner-1.0.1.tar.gz
Algorithm Hash digest
SHA256 c2293192f860967563c7d2950baaf48b322263cfc29cdcb0fc5bccb9e0a5110e
MD5 e71fc8a624cfabe8aab8315dba61dabb
BLAKE2b-256 293e1c3ec7be3ec778dec825f2e777e8dbc653ed9b0f3758476c3556d43c1601

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page