Skip to main content

Mozilla Bugzilla Bug Version ETL

Project description

Python version of Metric’s Bugzilla ETL (https://github.com/mozilla-metrics/bugzilla_etl)

Motivation and Details

https://wiki.mozilla.org/Auto-tools/Projects/PublicES

Requirements

  • PyPy 2.1.0 using Python 2.7 (cPython is way too slow)

  • A MySQL/Maria database with Mozilla’s Bugzilla schema (old public version can be found here)

  • A timezone database (instructions)

  • An ElasticSearch (v 0.20.5) cluster to hold the bug version documents

Installation

PyPy and SetupTools are required. If you are installing on Windows please follow instructions to get these installed. When done, installation is easy:

pip install Bugzilla-ETL

Setup

You must prepare a settings.json file to reference the resources, and it’s filename must be provided as an argument in the command line. Examples of settings files can be found in resources/settings

Bugzilla-ETL keeps local run state in the form of two files: first_run_time and last_run_time. These are both parameters in the ``settings.json``` file.

  • first_run_time is written only if it does not exist, and triggers a full ETL refresh. Delete this file if you want to create a new ES index and start ETL from the beginning.

  • last_run_time is recorded whenever there has been a successful ETL. This file will not exist until the initial full ETL has completed successfully. Deleteing this file should have no net effect, other than making the program work harder then it should.

Running bz_etl.py

Asuming your settings.json file is in ~/Bugzilla_ETL:

cd ~/Bugzilla_ETL
bzetl --settings=settings.json

Use --help for more options, and see example command line script

Got it working?

The initial ETL will take over two hours. If you want something quicker to confirm your configuration is correct, use --reset --quick arguments on the command line. This will limit ETL to the first 1000, and last 1000 bugs.

bzetl --settings=settings.json --reset --quick

Developer Installation

If you plan to help improve this software, or if you enjoy working from source, you can clone from Github:

git clone https://github.com/klahnakoski/Bugzilla-ETL.git

Install requirements:

pip install -e

It is best you install on Linux, but if you do install on Windows you can find further Windows-specific Python installation instructions at one of my other projects: https://github.com/klahnakoski/pyLibrary/blob/master/README.md

Running Tests

The Git clone will include test code. You can run those tests, but you must…

  • Have MySQL installed (no Bugzilla schema required)

  • Have timezone database installed (instructions)

  • A complete test_settings.json file to point to the resources (example)

  • Use pypy for 4x the speed: pypy .\tests\test_etl.py --settings=test_settings.json

More on ElasticSearch

If you are new to ElasticSearch, I recommend using ElasticSearch Head for getting cluster status, current schema definitions, viewing individual records, and more. Clone it off of GitHub, and open the index.html file from in your browser. Here are some alternate instructions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Bugzilla-ETL-0.3.13353.zip (195.0 kB view details)

Uploaded Source

Built Distributions

Bugzilla_ETL-0.3.13353-py2.7.egg (286.2 kB view details)

Uploaded Source

Bugzilla-ETL-0.3.13353.win32-py2.7.exe (463.0 kB view details)

Uploaded Source

File details

Details for the file Bugzilla-ETL-0.3.13353.zip.

File metadata

File hashes

Hashes for Bugzilla-ETL-0.3.13353.zip
Algorithm Hash digest
SHA256 b986dd4ac612cee23dcda67e9352df87f19b5f72846f2128d0106a780d8844dd
MD5 fb9412db7c39115e884f713d8c7772a7
BLAKE2b-256 aa448f837b53346a7df52e1d75af7be63a1b8df77f36565028adfb5d948db1b7

See more details on using hashes here.

File details

Details for the file Bugzilla_ETL-0.3.13353-py2.7.egg.

File metadata

File hashes

Hashes for Bugzilla_ETL-0.3.13353-py2.7.egg
Algorithm Hash digest
SHA256 c4f6b061e28a94c33eb3148482ff2a9d808fa0d08ffa27e7f9669673b728150e
MD5 1995a55d12e67dd85f4ae7e6649a1579
BLAKE2b-256 0e651cd7b31c7c1b55c83fb84fe98740b08ee49112ad672e8dc90fefd0a1d1e1

See more details on using hashes here.

File details

Details for the file Bugzilla-ETL-0.3.13353.win32-py2.7.exe.

File metadata

File hashes

Hashes for Bugzilla-ETL-0.3.13353.win32-py2.7.exe
Algorithm Hash digest
SHA256 ff26a0a0e063f04d56434741e76dcab80617f8f5c5e5504c4ceada002cd94d71
MD5 d09bf3e886b56c1552d70244aa42c69a
BLAKE2b-256 c9473bf7ba2b4958d42960dc1c521c70639a3cc7d0f85dd37955f9402baec9aa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page