GrimoireELK processes and stores software development data to ElasticSearch

These details have not been verified by PyPI

Project links

Project description

Welcome to GrimoireELK

GrimoireELK is the component of GrimoireLab that interacts with the ElasticSearch database. Its goal is two-fold, first it aims at offering a convenient way to store the data coming from Perceval, second it processes and enriches the data in a format that can be consumed by Kibiter.

The Perceval data is stored in ElasticSearch indexes as raw documents (one per item extracted by Perceval). Those raw documents, which will be referred to as "raw data" in this documentation, include all information coming from the original data source which grants the platform to perform multiple analysis without the need of downloading the same data over and over again. Once raw data is retrieved, a new phase starts where data is enriched according to the data source from where it was collected and stored in ElasticSearch indexes. The enrichment removes information not needed by Kibiter and includes additional information which is not directly available within the raw data. For instance, pair programming information for Git data, time to solve (i.e., close or merge) issues and pull requests for GitHub data, and identities and organization information coming from SortingHat . The enriched data is stored as JSON documents, which embed information linked to the corresponding raw documents to ease debugging and guarantee traceability.

Raw data

Each raw document stored in an ElasticSearch index contains a set of common first level fields, regardless of the data source:

backend (string): Name of the Perceval backend used to retrieve the information.
backend_version (string): Version of the abovementioned backend.
perceval_version (string): Perceval version.
timestamp (long): When the item was retrieved by Perceval (in epoch format).
origin (string): Where the item was retrieved from.
uuid (string): Item unique identifier.
updated_on (long): When the item was updated in the original source (in epoch format).
classified_fields_filtered (list): List of data field names (strings) which contained classified information and that were removed from the original item. Depends on activating ‘--filter-classified’ flag in Perceval.
category (string): Type of the items to fetch (commit, pull request, etc.) depending on the data source.
tag (string): Custom label that can be set in Perceval for each retrieval.
data (object): This field contains a copy in JSON format of the original data as it is retrieved from the data source. Next sections will describe where GrimoireLab get this information from.

Enriched data

Each enriched index includes one or more types of documents, which are summarized below.

Askbot: each document can be either a question, an answer or answer's comments.
Bugzilla: each document corresponds to a single issue (fetched using CGI calls).
Bugzillarest: each document corresponds to a single issue (fetched using Bugzilla REST API).
Cocom: each document corresponds to single file in a commit, with code complexity information.
Colic: each document corresponds to single file in a commit, with license information.
Confluence: each document can be either a new page, a page edit, a comment or an attachment.
Crates: each document corresponds to an event.
Discourse: each document can be either a question or an answer.
Dockerhub: each document corresponds to an image.
Finosmeetings: each document corresponds to details about a meeting.
Functest: each document corresponds to details about a test.
Gerrit: each document can be either a changeset, a comment, a patchset or a patchset approval.
Git: each document corresponds to a single commit.
Git Areas of Code: each document corresponds to one single file.
GitHub issues: each document corresponds to an issue.
GitHub pull requests: each document corresponds to a pull request.
GitHub repo statistics: each document includes repo statistics (e.g., forks, watchers).
GitLab issues: each document corresponds to an issue.
GitLab merge requests: each document corresponds to a merge request.
Gitter: each document corresponds to a message.
Googlehits: each document contains hits information derived from Google.
Groupsio: each document corresponds to a message.
Hyperkitty: each document corresponds to a message.
Jenkins: each document corresponds to a single built.
Jira: each document corresponds to an issue or a comment. To simplify counting user activities, issues are duplicated and they can include assignee, reporter and creator data respectively.
Kitsune: each document can be either a question or an answer.
Launchpad: each document corresponds to a bug.
Mattermost: each document corresponds to a message.
Mbox: each document corresponds to a message.
Mediawiki: each document corresponds to a review.
Meetup: each document can be either an event, a rsvp or a comment.
Mozillaclub: each document includes event information.
Nttp: each document corresponds to a message.
Onion Study/Community Structure: each document corresponds to an author in a specific quarter, split by organization and project. That means we have an entry for the author’s overall contributions in a given quarter, one entry for the author in each one of the projects he contributed to in that quarter and the same for the author in each of the organizations he is affiliated to in that quarter. This way we store results of onion analysis computed overall, by project and by organization
Pagure: each document corresponds to an issue.
Phabricator: each document corresponds to a task.
Pipermail: each document corresponds to a message.
Puppetforge: each document corresponds to a module.
Rocketchat: each document corresponds to a message.
Redmine: each document corresponds to an issue.
Remo activities: each document corresponds to an activity.
Remo events: each document corresponds to an event.
Remo users: each document corresponds to a user.
Rss: each document corresponds to an entry.
Slack: each document corresponds to a message.
Stackexchange: each document can be either a question or answer.
Supybot: each document corresponds to a message.
Telegram: each document corresponds to a message.
Twitter: each document corresponds to a tweet.

Fields

Each enriched document contains a set of fields, they can be (i) common to all data sources (e.g., metadata fields, time field), (ii) specific to the data source, (iii) related to contributor’s profile information (i.e., identity fields) or (iv) to the project listed in the Mordred projects.json (i.e., project fields).

Metadata fields

metadata__timestamp (date): Date when the item was retrieved from the original data source and stored in the index with raw documents.
metadata__updated_on (date): Date when the item was updated in its original data source.
metadata__enriched_on (date): Date when the item was enriched and stored in the index with enriched documents.
metadata__gelk_backend_name (string): Name of the backend used to enrich information.
metadata__gelk_version (string): Version of the backend used to enrich information.
origin (string): Original URL where the repository was retrieved from.

Identity fields

author_uuid (string): Author profile unique identifier. Used for counting authors and cross-referencing data among data sources in ElasticSearch and between ElasticSearch, SortingHat and Hatstall.
author_org_name (string): Organization name to which the author is affiliated to. Same author could have different affiliations based on non-overlapping time periods. Used for aggregating contributors and contributions by organization.
author_name (string): Similar to author_uuid, but less useful for unique counts as different profiles could share the same name. Nevertheless is more appropriate to show this field when aggregating data by author as it is usually nicer to see a name than a hash value.
author_bot (boolean): True if the given author is identified as a bot.
author_domain (string): Domain associated to the author in SortingHat profile.
author_id (string): Author identifier. This id comes from SortingHat and identifies each different identity provided by SortingHat. These identifiers are grouped in a single author_uuid, so this fields is not commonly used unless data needs to be debugged.

Project fields

project (string): Project name as defined in the JSON file where repositories are grouped by project.
project_1 (string): Project (if more than one level is allowed in project hierarchy).

Time field:

grimoire_creation_date (date): Date when the item was created upstream. Used by default to represent data in time series on the dashboards.

Demography fields:

author_max_date (date): Date of most recent commit made by this author.
author_min_date (date): Date of the first commit made by this author.

Extra fields:

extra_ (anything): Extra fields added using the enrich_extra_data study.

Data source specific fields

Details of the fields of each data source is available in the Schema folder.

Installation

There are several ways to install GrimoireELK on your system: packages or source code using Poetry or pip.

PyPI

GrimoireELK can be installed using pip, a tool for installing Python packages. To do it, run the next command:

$ pip install grimoire-elk

Source code

To install from the source code you will need to clone the repository first:

$ git clone https://github.com/chaoss/grimoirelab-elk
$ cd grimoirelab-elk

Then use pip or Poetry to install the package along with its dependencies.

Pip

To install the package from local directory run the following command:

$ pip install .

In case you are a developer, you should install GrimoireELK in editable mode:

$ pip install -e .

Poetry

We use poetry for dependency management and packaging. You can install it following its documentation. Once you have installed it, you can install GrimoireELK and the dependencies in a project isolated environment using:

$ poetry install

To spaw a new shell within the virtual environment use:

$ poetry shell

Running tests

Tests are located in the folder tests. In order to run them, you need to have in your machine instances (or Docker containers) of ElasticSearch and MySQL

Then you need to:

update the file tests.conf file:
- in case your ElasticSearch instance isn't available at http://localhost:9200. For example, if you are using the secure edition of elasticsearch, it will be located at https://admin:admin@localhost:9200
- in case you are using non-default credentials for your SortingHat database, you will need to include the [Database] section of the file with both user and password parameters
create the databases test_sh and test_projects in your MySQL instance (e.g., mysql -u root -e "create database test_sh", if you are running mysql in a container use docker exec -i <container id> mysql -u root -e "create database test_sh")
populate the database test_projects with the SQL file test_projects.sql (e.g., mysql -u root test_projects < tests/test_projects.sql)

The full battery of tests can be executed with run_tests.py. However, it is also possible to execute a sub-set of tests by running the single test files (test_* files in the tests folder)

The tests can be run in combination with the Python package coverage. The steps below show how to do it:

$ pip3 install coveralls
$ cd <path-to-ELK>/tests
$ python3 -m coverage run run_tests.py --source=grimoire_elk

pycharm-config-run_tests

Coverage will generate a file .coverage in the tests folder, which can be inspected with the following command:

cd <path-to-ELK>/tests
python3 -m coverage report -m

pycharm-config_report

The output will be similar to the following one:

Name                                                                                                                Stmts   Miss  Cover   Missing
--------------------------------------------------------------------------------------------------------------------------------------------------
.../ELK/grimoire_elk/__init__.py                                                                                       4      0   100%
.../ELK/grimoire_elk/_version.py                                                                                       1      0   100%

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.7.4

May 6, 2026

1.7.4rc1 pre-release

May 6, 2026

1.7.3

Mar 5, 2026

1.7.2

Mar 3, 2026

1.7.1

Mar 3, 2026

1.7.1rc1 pre-release

Mar 2, 2026

1.7.0

Jan 21, 2026

1.7.0rc1 pre-release

Jan 21, 2026

1.6.0

Dec 12, 2025

1.6.0rc1 pre-release

Dec 12, 2025

1.5.4rc1 pre-release

Dec 9, 2025

1.5.3

Nov 25, 2025

1.5.3rc1 pre-release

Nov 25, 2025

1.5.2

Nov 11, 2025

1.5.2rc1 pre-release

Nov 11, 2025

1.5.1

Oct 31, 2025

1.5.1rc2 pre-release

Oct 30, 2025

1.5.1rc1 pre-release

Oct 30, 2025

1.5.0

Oct 10, 2025

1.5.0rc1 pre-release

Oct 10, 2025

1.4.0

Sep 23, 2025

1.4.0rc2 pre-release

Sep 15, 2025

1.4.0rc1 pre-release

Sep 10, 2025

1.3.13rc1 pre-release

Aug 28, 2025

1.3.12

Aug 21, 2025

1.3.11

Aug 18, 2025

1.3.11rc1 pre-release

Aug 18, 2025

1.3.10

Jun 19, 2025

1.3.10rc1 pre-release

Jun 19, 2025

1.3.9

Jun 18, 2025

1.3.9rc1 pre-release

Jun 18, 2025

1.3.8

Jun 3, 2025

1.3.8rc1 pre-release

Jun 3, 2025

1.3.7

May 20, 2025

1.3.7rc1 pre-release

May 20, 2025

1.3.6

May 9, 2025

1.3.6rc1 pre-release

May 9, 2025

1.3.5

Apr 9, 2025

1.3.5rc2 pre-release

Apr 9, 2025

1.3.5rc1 pre-release

Apr 8, 2025

1.3.4

Jan 16, 2025

1.3.4rc1 pre-release

Jan 16, 2025

1.3.3

Jan 15, 2025

1.3.3rc1 pre-release

Jan 15, 2025

1.3.2

Dec 11, 2024

1.3.2rc1 pre-release

Dec 11, 2024

1.3.1

Nov 13, 2024

1.3.1rc1 pre-release

Nov 13, 2024

1.3.0

Oct 15, 2024

1.3.0rc1 pre-release

Oct 14, 2024

1.2.0

Sep 23, 2024

1.2.0rc1 pre-release

Sep 20, 2024

1.1.5

Aug 30, 2024

1.1.5rc1 pre-release

Aug 30, 2024

1.1.4

Aug 13, 2024

1.1.4rc1 pre-release

Aug 13, 2024

1.1.3

Aug 9, 2024

1.1.3rc1 pre-release

Aug 9, 2024

1.1.2

Aug 2, 2024

1.1.2rc1 pre-release

Aug 2, 2024

1.1.1

Jun 21, 2024

1.1.1rc1 pre-release

Jun 21, 2024

1.1.0

May 9, 2024

1.1.0rc1 pre-release

May 9, 2024

1.0.0

Apr 13, 2024

1.0.0rc3 pre-release

Apr 12, 2024

1.0.0rc2 pre-release

Apr 11, 2024

1.0.0rc1 pre-release

Apr 9, 2024

0.111.1

Mar 27, 2024

0.111.1rc1 pre-release

Mar 27, 2024

0.111.0

Mar 12, 2024

0.111.0rc1 pre-release

Mar 12, 2024

0.110.0

Mar 1, 2024

0.110.0rc1 pre-release

Mar 1, 2024

0.109.8

Feb 19, 2024

0.109.8rc1 pre-release

Feb 19, 2024

0.109.7

Feb 8, 2024

0.109.7rc1 pre-release

Feb 8, 2024

0.109.6

Feb 1, 2024

0.109.6rc1 pre-release

Feb 1, 2024

0.109.5

Jan 30, 2024

0.109.5rc1 pre-release

Jan 30, 2024

0.109.4

Dec 19, 2023

0.109.4rc1 pre-release

Dec 19, 2023

0.109.3

Nov 28, 2023

0.109.3rc1 pre-release

Nov 28, 2023

0.109.2

Nov 17, 2023

0.109.2rc1 pre-release

Nov 14, 2023

0.109.1

Nov 3, 2023

0.109.1rc1 pre-release

Nov 3, 2023

0.109.0

Oct 20, 2023

0.109.0rc1 pre-release

Oct 20, 2023

0.108.1

Aug 6, 2023

0.108.1rc1 pre-release

Aug 6, 2023

0.108.0

Jul 23, 2023

0.108.0rc1 pre-release

Jul 23, 2023

0.107.0

Jul 11, 2023

0.107.0rc1 pre-release

Jul 11, 2023

0.106.0

Jun 28, 2023

0.106.0rc2 pre-release

Jun 23, 2023

0.106.0rc1 pre-release

Jun 22, 2023

0.105.0

May 17, 2023

0.105.0rc2 pre-release

May 17, 2023

0.105.0rc1 pre-release

May 17, 2023

0.104.6

Apr 28, 2023

0.104.5

Apr 27, 2023

0.104.4

Apr 26, 2023

0.104.3

Apr 21, 2023

0.104.3rc2 pre-release

Apr 21, 2023

0.104.3rc1 pre-release

Apr 21, 2023

0.104.2

Feb 3, 2023

0.104.1

Feb 1, 2023

0.104.0

Feb 1, 2023

0.104.0rc8 pre-release

Feb 1, 2023

0.104.0rc7 pre-release

Jan 23, 2023

0.104.0rc6 pre-release

Jan 20, 2023

0.104.0rc5 pre-release

Jan 20, 2023

0.104.0rc4 pre-release

Jan 20, 2023

0.104.0rc3 pre-release

Jan 10, 2023

0.104.0rc2 pre-release

Jan 10, 2023

0.104.0rc1 pre-release

Jan 10, 2023

0.103.3

Nov 7, 2022

0.103.2

Oct 31, 2022

0.103.2rc2 pre-release

Oct 26, 2022

0.103.2rc1 pre-release

Oct 26, 2022

0.103.1

Sep 27, 2022

0.103.0

Sep 26, 2022

0.103.0rc10 pre-release

Sep 26, 2022

0.103.0rc9 pre-release

Sep 26, 2022

0.103.0rc8 pre-release

Sep 26, 2022

0.103.0rc7 pre-release

Sep 23, 2022

0.103.0rc6 pre-release

Sep 23, 2022

0.103.0rc5 pre-release

Sep 23, 2022

0.103.0rc4 pre-release

Sep 7, 2022

0.103.0rc3 pre-release

Aug 23, 2022

0.103.0rc2 pre-release

Jul 22, 2022

0.103.0rc1 pre-release

Jul 21, 2022

0.102.0

Jun 24, 2022

0.101.1

Jun 3, 2022

0.101.0

Jun 3, 2022

0.100.0

Mar 18, 2022

0.99.0

Jan 27, 2022

0.98.0

Jan 13, 2022

0.97.0

Jan 11, 2022

0.96.0

Nov 19, 2021

0.95.0

Nov 5, 2021

0.94.0

Oct 25, 2021

0.93.0

Sep 17, 2021

0.92.0

Sep 7, 2021

0.91.0

Aug 31, 2021

0.90.0

Aug 23, 2021

0.89.0

Aug 17, 2021

0.87.0

Jun 9, 2021

0.86.0

Mar 15, 2021

0.85.0

Feb 11, 2021

0.84.0

Jan 26, 2021

0.83.0

Dec 9, 2020

0.63.0

Oct 29, 2019

0.62.0

Oct 1, 2019

0.58.0

Jul 9, 2019

0.55.0

Jun 5, 2019

0.47.0

Mar 28, 2019

0.36.0

Jan 15, 2019

0.32.0

Nov 21, 2018

0.31.4

Nov 9, 2018

0.31.0

Oct 19, 2018

0.30.53

Oct 5, 2018

0.30.51

Sep 11, 2018

0.30.48

Aug 24, 2018

0.30.39

Jun 8, 2018

0.30.37

May 17, 2018

0.30.33

Apr 15, 2018

0.30.30

Apr 8, 2018

0.30.27

Mar 22, 2018

0.30.24

Mar 13, 2018

0.30.23

Jan 31, 2018

0.30.22

Jan 23, 2018

0.30.18

Dec 29, 2017

0.30.13

Nov 27, 2017

0.30.11

Nov 14, 2017

0.30.9

Nov 2, 2017

0.30.8

Oct 26, 2017

0.30.7

Oct 20, 2017

0.30.4

Jul 18, 2017

0.26.5

Apr 16, 2017

0.22.1

Jan 28, 2017

0.22

Jan 28, 2017

0.20rc1 pre-release

Dec 30, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grimoire_elk-1.7.4.tar.gz (893.7 kB view details)

Uploaded May 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

grimoire_elk-1.7.4-py3-none-any.whl (300.2 kB view details)

Uploaded May 6, 2026 Python 3

File details

Details for the file grimoire_elk-1.7.4.tar.gz.

File metadata

Download URL: grimoire_elk-1.7.4.tar.gz
Upload date: May 6, 2026
Size: 893.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.4.0 CPython/3.12.3 Linux/6.17.0-1010-azure

File hashes

Hashes for grimoire_elk-1.7.4.tar.gz
Algorithm	Hash digest
SHA256	`7d665b43156b594b36f4f75b2578d9d1842cf610ed9a86f0c5bbe07b5083d25f`
MD5	`b071f8ea435a9743dc6acf47259970b8`
BLAKE2b-256	`a9e1fc4766c1fb71dfd1b05344326fccf6bd59e939d4d78639f929c5c4634764`

See more details on using hashes here.

File details

Details for the file grimoire_elk-1.7.4-py3-none-any.whl.

File metadata

Download URL: grimoire_elk-1.7.4-py3-none-any.whl
Upload date: May 6, 2026
Size: 300.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.4.0 CPython/3.12.3 Linux/6.17.0-1010-azure

File hashes

Hashes for grimoire_elk-1.7.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`253a50211f326b0b5340c005a7ca42ef0fc2f32c9428779d29ee685b477731bb`
MD5	`03fdc776dc16a3b03b8e636a2733948d`
BLAKE2b-256	`91360aa772431fab6ebe2e440260553dff612654fa892fe63754b707db5100b1`

See more details on using hashes here.

grimoire-elk 1.7.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Welcome to GrimoireELK

Raw data

Enriched data

Fields

Metadata fields

Identity fields

Project fields

Time field:

Demography fields:

Extra fields:

Data source specific fields

Installation

PyPI

Source code

Pip

Poetry

Running tests

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes