Skip to main content

Jenkins build data ETL with DuckDB

Project description

Duck Jenkins: loading jenkins build info into DuckDB

Python package PyPI Downloads PyPI - Python Version License

What is it?

ETL(Extract Transform Load) for Jenkins data.

Installation

pip install duck-jenkins

Main features

Jenkins build extractor

  • Extract and serialize Jenkins' build information along with artefact metadata into files.
  • A fix file structure can support multiple Jenkins servers.
  • Support multi-branch structure
└── data
    ├── jenkins1.example.io
    └── jenkins2.example.io
        ├── pipeline1
        │    └── 1_info.json
        └── pipeline2
            └── master
                ├── 1_info.json
                └── 1_artifact.csv

DuckDB transformer

Transform all serialized data above to relational database, DuckDB.

Database ER diagram

erDiagram
    Jenkins ||--o{ Job: has
    Job ||--o{ Build: has
    Build ||--o{ Artifact: has
    Build ||--o| Jenkins_User: has
    Build ||--o{ Cause: has
    Build ||--o{ Parameter: has
    Build ||--|| Result: has
    Parameter ||--|| ParameterDictionary: has
    Jenkins{
        int id PK
        str domain_name
    }
    Job{
        int id  PK
        str name
        int jenkins_id FK
    }
    Result{
        int id PK
        str name
    }
    Jenkins_User{
        int id PK
        str name
        str lan_id
    }
    Cause{
        int id PK
        str category
    }
    Build{
        int id               PK
        int job_id           FK
        int build_number
        int result_id        FK
        int user_id          FK 
        int trigger_type     FK "Cause table's PK"
        int duration
        datetime timestamp
        int upstream_job_id FK
        int upstream_build_number
        int upstream_type   FK "Cause table's PK"
        int previous_build_number
    }
    ParameterDictionary{
        int id PK
        str name
    }
    Parameter{
        int build_id FK
        int name_id  FK
        str value
    }
    Artifact{
        int id        PK
        int build_id  FK
        str file_name
        str dir
        int size
        datetime timestamp
    }

Example

Jenkins Build extractor

Following examples try to emulate the file structure aboved.

1. Extract build

Extracting a multi-branch pipeline

from duck_jenkins import JenkinsData

jd = JenkinsData(
    domain_name='jenkins1.example.io',
    verify_ssl=False,
    user_id='C001',
    secret='elwerqoqiweucv',
    data_directory='data'
)
jd.pull(
    project_name='pipeline2/master',
    build_number=1,
    artifact=True
)

2. Extract upstream build

Let assume the upstream of pipeline2/master/1 is pipeline1/1.

from duck_jenkins import JenkinsData

jd = JenkinsData(
    domain_name='jenkins1.example.io',
    verify_ssl=False,
    user_id='C001',
    secret='elwerqoqiweucv',
    data_directory='data'
)
jd.pull_upstream(
    project_name='pipeline2/master',
    build_number=1,
    artifact=False
)

3. Extract previous build

from duck_jenkins import JenkinsData

jd = JenkinsData(
    domain_name='jenkins1.example.io',
    verify_ssl=False,
    user_id='C001',
    secret='elwerqoqiweucv',
    data_directory='data'
)
jd.pull_previous(
    project_name='pipeline2/master',
    build_number=2,  # build 2 is excluded from the extraction in this function. 
    artifact=True,
    overwrite=True,
    size=1  # say, you only interested 1 previous build.
)

DuckDB transformation

Without transform into a database, it is useless. Following steps demostrate how to import into DuckDB.

from duck_jenkins import DuckLoader
import duckdb

db = duckdb.connect('1.ddb')
cursor = db.cursor()

dl = DuckLoader(cursor, 'data')
dl.import_into_db(
    jenkins_domain_name='jenkins1.example.io', 
    overwrite=False  # False to skip insert for existing record.
)

cursor.commit()
cursor.close()

For more usage of DuckDB, visit the official document: https://duckdb.org/docs/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duck-jenkins-0.0.25.tar.gz (12.9 kB view hashes)

Uploaded Source

Built Distribution

duck_jenkins-0.0.25-py3-none-any.whl (12.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page