Skip to main content

Jenkins build data ETL with DuckDB

Project description

Duck Jenkins: loading jenkins build info into DuckDB

Python package PyPI Downloads PyPI - Python Version License

What is it?

ETL(Extract Transform Load) for Jenkins data.

Installation

pip install duck-jenkins

Main features

Jenkins build extractor

  • Extract and serialize Jenkins' build information along with artefact metadata into files.
  • A fix file structure can support multiple Jenkins servers.
  • Support multi-branch structure
└── data
    ├── jenkins1.example.io
    └── jenkins2.example.io
        ├── pipeline1
        │    └── 1_info.json
        └── pipeline2
            └── master
                ├── 1_info.json
                └── 1_artifact.csv

DuckDB transformer

Transform all serialized data above to relational database, DuckDB.

Database ER diagram

erDiagram
    Jenkins ||--o{ Job: has
    Job ||--o{ Build: has
    Build ||--o{ Artifact: has
    Build ||--o| Jenkins_User: has
    Build ||--o{ Cause: has
    Build ||--o{ Parameter: has
    Build ||--|| Result: has
    Parameter ||--|| ParameterDictionary: has
    Jenkins{
        int id PK
        str domain_name
    }
    Job{
        int id  PK
        str name
        int jenkins_id FK
    }
    Result{
        int id PK
        str name
    }
    Jenkins_User{
        int id PK
        str name
        str lan_id
    }
    Cause{
        int id PK
        str category
    }
    Build{
        int id               PK
        int job_id           FK
        int build_number
        int result_id        FK
        int user_id          FK 
        int trigger_type     FK "Cause table's PK"
        int duration
        datetime timestamp
        int upstream_job_id FK
        int upstream_build_number
        int upstream_type   FK "Cause table's PK"
        int previous_build_number
    }
    ParameterDictionary{
        int id PK
        str name
    }
    Parameter{
        int build_id FK
        int name_id  FK
        str value
    }
    Artifact{
        int id        PK
        int build_id  FK
        str file_name
        str dir
        int size
        datetime timestamp
    }

Example

Jenkins Build extractor

Following examples try to emulate the file structure aboved.

1. Extract build

Extracting a multi-branch pipeline

from duck_jenkins import JenkinsData

jd = JenkinsData(
    domain_name='jenkins1.example.io',
    verify_ssl=False,
    user_id='C001',
    secret='elwerqoqiweucv',
    data_directory='data'
)
jd.pull(
    project_name='pipeline2/master',
    build_number=1,
    artifact=True
)

2. Extract upstream build

Let assume the upstream of pipeline2/master/1 is pipeline1/1.

from duck_jenkins import JenkinsData

jd = JenkinsData(
    domain_name='jenkins1.example.io',
    verify_ssl=False,
    user_id='C001',
    secret='elwerqoqiweucv',
    data_directory='data'
)
jd.pull_upstream(
    project_name='pipeline2/master',
    build_number=1,
    artifact=False
)

3. Extract previous build

from duck_jenkins import JenkinsData

jd = JenkinsData(
    domain_name='jenkins1.example.io',
    verify_ssl=False,
    user_id='C001',
    secret='elwerqoqiweucv',
    data_directory='data'
)
jd.pull_previous(
    project_name='pipeline2/master',
    build_number=2,  # build 2 is excluded from the extraction in this function. 
    artifact=True,
    overwrite=True,
    size=1  # say, you only interested 1 previous build.
)

DuckDB transformation

Without transform into a database, it is useless. Following steps demostrate how to import into DuckDB.

from duck_jenkins import DuckLoader
import duckdb

db = duckdb.connect('1.ddb')
cursor = db.cursor()

dl = DuckLoader(cursor, 'data')
dl.import_into_db(
    jenkins_domain_name='jenkins1.example.io', 
    overwrite=False  # False to skip insert for existing record.
)

cursor.commit()
cursor.close()

For more usage of DuckDB, visit the official document: https://duckdb.org/docs/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duck-jenkins-0.0.25.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

duck_jenkins-0.0.25-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file duck-jenkins-0.0.25.tar.gz.

File metadata

  • Download URL: duck-jenkins-0.0.25.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for duck-jenkins-0.0.25.tar.gz
Algorithm Hash digest
SHA256 d079fdd23ae9a41b2206d79471bb835175b1145e141f5a1761ed3e548972ad28
MD5 3271cd7a6e93b5e250492875074d4923
BLAKE2b-256 31fbc70254767d20fb85cc0341bd80aeb6fbf103276cde03808f55eb9692ccf5

See more details on using hashes here.

File details

Details for the file duck_jenkins-0.0.25-py3-none-any.whl.

File metadata

File hashes

Hashes for duck_jenkins-0.0.25-py3-none-any.whl
Algorithm Hash digest
SHA256 b6015281b5334825778f0911eb0965ef0242bd55ccb522b5630e17e90e51a5a7
MD5 4865cacfea812f2f7e360467ea4c2256
BLAKE2b-256 eb58c7702801ae7c4e52f9933f8dab735fc8aec931c285499ff81e1821a2e379

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page