Jenkins build data ETL with DuckDB
Project description
Duck Jenkins: loading jenkins build info into DuckDB
What is it?
ETL(Extract Transform Load) for Jenkins data.
Installation
pip install duck-jenkins
Main features
Jenkins build extractor
- Extract and serialize Jenkins' build information along with artefact metadata into files.
- A fix file structure can support multiple Jenkins servers.
- Support multi-branch structure
└── data
├── jenkins1.example.io
└── jenkins2.example.io
├── pipeline1
│ └── 1_info.json
└── pipeline2
└── master
├── 1_info.json
└── 1_artifact.csv
DuckDB transformer
Transform all serialized data above to relational database, DuckDB.
Database ER diagram
erDiagram
Jenkins ||--o{ Job: has
Job ||--o{ Build: has
Build ||--o{ Artifact: has
Build ||--o| Jenkins_User: has
Build ||--o{ Cause: has
Build ||--o{ Parameter: has
Build ||--|| Result: has
Parameter ||--|| ParameterDictionary: has
Jenkins{
int id PK
str domain_name
}
Job{
int id PK
str name
int jenkins_id FK
}
Result{
int id PK
str name
}
Jenkins_User{
int id PK
str name
str lan_id
}
Cause{
int id PK
str category
}
Build{
int id PK
int job_id FK
int build_number
int result_id FK
int user_id FK
int trigger_type FK "Cause table's PK"
int duration
datetime timestamp
int upstream_job_id FK
int upstream_build_number
int upstream_type FK "Cause table's PK"
int previous_build_number
}
ParameterDictionary{
int id PK
str name
}
Parameter{
int build_id FK
int name_id FK
str value
}
Artifact{
int id PK
int build_id FK
str file_name
str dir
int size
datetime timestamp
}
Example
Jenkins Build extractor
Following examples try to emulate the file structure aboved.
1. Extract build
Extracting a multi-branch pipeline
from duck_jenkins import JenkinsData
jd = JenkinsData(
domain_name='jenkins1.example.io',
verify_ssl=False,
user_id='C001',
secret='elwerqoqiweucv',
data_directory='data'
)
jd.pull(
project_name='pipeline2/master',
build_number=1,
artifact=True
)
2. Extract upstream build
Let assume the upstream of pipeline2/master/1
is pipeline1/1
.
from duck_jenkins import JenkinsData
jd = JenkinsData(
domain_name='jenkins1.example.io',
verify_ssl=False,
user_id='C001',
secret='elwerqoqiweucv',
data_directory='data'
)
jd.pull_upstream(
project_name='pipeline2/master',
build_number=1,
artifact=False
)
3. Extract previous build
from duck_jenkins import JenkinsData
jd = JenkinsData(
domain_name='jenkins1.example.io',
verify_ssl=False,
user_id='C001',
secret='elwerqoqiweucv',
data_directory='data'
)
jd.pull_previous(
project_name='pipeline2/master',
build_number=2, # build 2 is excluded from the extraction in this function.
artifact=True,
overwrite=True,
size=1 # say, you only interested 1 previous build.
)
DuckDB transformation
Without transform into a database, it is useless. Following steps demostrate how to import into DuckDB.
from duck_jenkins import DuckLoader
import duckdb
db = duckdb.connect('1.ddb')
cursor = db.cursor()
dl = DuckLoader(cursor, 'data')
dl.import_into_db(
jenkins_domain_name='jenkins1.example.io',
overwrite=False # False to skip insert for existing record.
)
cursor.commit()
cursor.close()
For more usage of DuckDB
, visit the official document:
https://duckdb.org/docs/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
duck-jenkins-0.0.25.tar.gz
(12.9 kB
view details)
Built Distribution
File details
Details for the file duck-jenkins-0.0.25.tar.gz
.
File metadata
- Download URL: duck-jenkins-0.0.25.tar.gz
- Upload date:
- Size: 12.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d079fdd23ae9a41b2206d79471bb835175b1145e141f5a1761ed3e548972ad28 |
|
MD5 | 3271cd7a6e93b5e250492875074d4923 |
|
BLAKE2b-256 | 31fbc70254767d20fb85cc0341bd80aeb6fbf103276cde03808f55eb9692ccf5 |
File details
Details for the file duck_jenkins-0.0.25-py3-none-any.whl
.
File metadata
- Download URL: duck_jenkins-0.0.25-py3-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6015281b5334825778f0911eb0965ef0242bd55ccb522b5630e17e90e51a5a7 |
|
MD5 | 4865cacfea812f2f7e360467ea4c2256 |
|
BLAKE2b-256 | eb58c7702801ae7c4e52f9933f8dab735fc8aec931c285499ff81e1821a2e379 |