Mozilla Bugzilla Bug Version ETL
Project description
Python version of Metric’s Bugzilla ETL (https://github.com/mozilla-metrics/bugzilla_etl)
Motivation and Details
Requirements
PyPy 2.1.0 using Python 2.7 (cPython is way too slow)
A MySQL/Maria database with Mozilla’s Bugzilla schema (old public version can be found here)
A timezone database (instructions)
An ElasticSearch (v 0.20.5) cluster to hold the bug version documents
Installation
PyPy and SetupTools are required. If you are installing on Windows please follow instructions to get these installed. When done, installation is easy:
pip install Bugzilla-ETL
Setup
You must prepare a settings.json file to reference the resources, and it’s filename must be provided as an argument in the command line. Examples of settings files can be found in resources/settings
Bugzilla-ETL keeps local run state in the form of two files: first_run_time and last_run_time. These are both parameters in the ``settings.json``` file.
first_run_time is written only if it does not exist, and triggers a full ETL refresh. Delete this file if you want to create a new ES index and start ETL from the beginning.
last_run_time is recorded whenever there has been a successful ETL. This file will not exist until the initial full ETL has completed successfully. Deleteing this file should have no net effect, other than making the program work harder then it should.
Running bz_etl.py
Asuming your settings.json file is in ~/Bugzilla_ETL:
cd ~/Bugzilla_ETL bzetl --settings=settings.json
Use --help for more options, and see example command line script
Got it working?
The initial ETL will take over two hours. If you want something quicker to confirm your configuration is correct, use --reset --quick arguments on the command line. This will limit ETL to the first 1000, and last 1000 bugs.
bzetl --settings=settings.json --reset --quick
Developer Installation
If you plan to help improve this software, or if you enjoy working from source, you can clone from Github:
git clone https://github.com/klahnakoski/Bugzilla-ETL.git
Install requirements:
pip install -e
It is best you install on Linux, but if you do install on Windows you can find further Windows-specific Python installation instructions at one of my other projects: https://github.com/klahnakoski/pyLibrary/blob/master/README.md
Running Tests
The Git clone will include test code. You can run those tests, but you must…
Have MySQL installed (no Bugzilla schema required)
Have timezone database installed (instructions)
A complete test_settings.json file to point to the resources (example)
Use pypy for 4x the speed: pypy .\tests\test_etl.py --settings=test_settings.json
More on ElasticSearch
If you are new to ElasticSearch, I recommend using ElasticSearch Head for getting cluster status, current schema definitions, viewing individual records, and more. Clone it off of GitHub, and open the index.html file from in your browser. Here are some alternate instructions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file Bugzilla-ETL-0.3.13353.zip
.
File metadata
- Download URL: Bugzilla-ETL-0.3.13353.zip
- Upload date:
- Size: 195.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b986dd4ac612cee23dcda67e9352df87f19b5f72846f2128d0106a780d8844dd |
|
MD5 | fb9412db7c39115e884f713d8c7772a7 |
|
BLAKE2b-256 | aa448f837b53346a7df52e1d75af7be63a1b8df77f36565028adfb5d948db1b7 |
File details
Details for the file Bugzilla_ETL-0.3.13353-py2.7.egg
.
File metadata
- Download URL: Bugzilla_ETL-0.3.13353-py2.7.egg
- Upload date:
- Size: 286.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c4f6b061e28a94c33eb3148482ff2a9d808fa0d08ffa27e7f9669673b728150e |
|
MD5 | 1995a55d12e67dd85f4ae7e6649a1579 |
|
BLAKE2b-256 | 0e651cd7b31c7c1b55c83fb84fe98740b08ee49112ad672e8dc90fefd0a1d1e1 |
File details
Details for the file Bugzilla-ETL-0.3.13353.win32-py2.7.exe
.
File metadata
- Download URL: Bugzilla-ETL-0.3.13353.win32-py2.7.exe
- Upload date:
- Size: 463.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff26a0a0e063f04d56434741e76dcab80617f8f5c5e5504c4ceada002cd94d71 |
|
MD5 | d09bf3e886b56c1552d70244aa42c69a |
|
BLAKE2b-256 | c9473bf7ba2b4958d42960dc1c521c70639a3cc7d0f85dd37955f9402baec9aa |