Jenkins build data ETL with DuckDB
Project description
Duck Jenkins: loading jenkins build info into DuckDB
What is it?
ETL(Extract Transform Load) for Jenkins data.
Installation
pip install duck-jenkins
Main features
Jenkins build extractor
- Extract and serialize Jenkins' build information along with artefact metadata into files.
- A fix file structure can support multiple Jenkins servers.
- Support multi-branch structure
└── data
├── jenkins1.example.io
└── jenkins2.example.io
├── pipeline1
│ └── 1_info.json
└── pipeline2
└── master
├── 1_info.json
└── 1_artifact.csv
DuckDB transformer
Transform all serialized data above to relational database, DuckDB.
Database ER diagram
erDiagram
Jenkins ||--o{ Job: has
Job ||--o{ Build: has
Build ||--o{ Artifact: has
Build ||--o| Jenkins_User: has
Build ||--o{ Cause: has
Build ||--o{ Parameter: has
Build ||--|| Result: has
Parameter ||--|| ParameterDictionary: has
Jenkins{
int id PK
str domain_name
}
Job{
int id PK
str name
int jenkins_id FK
}
Result{
int id PK
str name
}
Jenkins_User{
int id PK
str name
str lan_id
}
Cause{
int id PK
str category
}
Build{
int id PK
int job_id FK
int build_number
int result_id FK
int user_id FK
int trigger_type FK "Cause table's PK"
int duration
datetime timestamp
int upstream_job_id FK
int upstream_build_number
int upstream_type FK "Cause table's PK"
int previous_build_number
}
ParameterDictionary{
int id PK
str name
}
Parameter{
int build_id FK
int name_id FK
str value
}
Artifact{
int id PK
int build_id FK
str file_name
str dir
int size
datetime timestamp
}
Example
Jenkins Build extractor
Following examples try to emulate the file structure aboved.
1. Extract build
Extracting a multi-branch pipeline
from duck_jenkins import JenkinsData
jd = JenkinsData(
domain_name='jenkins1.example.io',
verify_ssl=False,
user_id='C001',
secret='elwerqoqiweucv',
data_directory='data'
)
jd.pull(
project_name='pipeline2/master',
build_number=1,
artifact=True
)
2. Extract upstream build
Let assume the upstream of pipeline2/master/1
is pipeline1/1
.
from duck_jenkins import JenkinsData
jd = JenkinsData(
domain_name='jenkins1.example.io',
verify_ssl=False,
user_id='C001',
secret='elwerqoqiweucv',
data_directory='data'
)
jd.pull_upstream(
project_name='pipeline2/master',
build_number=1,
artifact=False
)
3. Extract previous build
from duck_jenkins import JenkinsData
jd = JenkinsData(
domain_name='jenkins1.example.io',
verify_ssl=False,
user_id='C001',
secret='elwerqoqiweucv',
data_directory='data'
)
jd.pull_previous(
project_name='pipeline2/master',
build_number=2, # build 2 is excluded from the extraction in this function.
artifact=True,
overwrite=True,
size=1 # say, you only interested 1 previous build.
)
DuckDB transformation
Without transform into a database, it is useless. Following steps demostrate how to import into DuckDB.
from duck_jenkins import DuckLoader
import duckdb
db = duckdb.connect('1.ddb')
cursor = db.cursor()
dl = DuckLoader(cursor, 'data')
dl.import_into_db(
jenkins_domain_name='jenkins1.example.io',
overwrite=False # False to skip insert for existing record.
)
cursor.commit()
cursor.close()
For more usage of DuckDB
, visit the official document:
https://duckdb.org/docs/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
duck-jenkins-0.0.20.tar.gz
(12.6 kB
view hashes)
Built Distribution
Close
Hashes for duck_jenkins-0.0.20-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e4629bceee730608fd5d53d0d25a9ff0cd909416bcaf8edfb9277581beccccad |
|
MD5 | 08812fe36e0d4756ae52c47db2a9a5b5 |
|
BLAKE2b-256 | f7cecc45e461351e4215690b4c49fd9f51bbc2b9152a443a81b7c502bae83f8b |