Jenkins build data ETL with DuckDB
Project description
Duck Jenkins: loading jenkins build info into DuckDB
What is it?
ETL(Extract Transform Load) for Jenkins data.
Installation
pip install duck-jenkins
Main features
Jenkins build extractor
- Extract and serialize Jenkins' build information along with artefact metadata into files.
- A fix file structure can support multiple Jenkins servers.
- Support multi-branch structure
└── data
├── jenkins1.example.io
└── jenkins2.example.io
├── pipeline1
│ └── 1_info.json
└── pipeline2
└── master
├── 1_info.json
└── 1_artifact.csv
DuckDB transformer
Transform all serialized data above to relational database, DuckDB.
Database ER diagram
erDiagram
Jenkins ||--o{ Job: has
Job ||--o{ Build: has
Build ||--o{ Artifact: has
Build ||--o| Jenkins_User: has
Build ||--o{ Cause: has
Build ||--o{ Parameter: has
Build ||--|| Result: has
Parameter ||--|| ParameterDictionary: has
Jenkins{
int id PK
str domain_name
}
Job{
int id PK
str name
int jenkins_id FK
}
Result{
int id PK
str name
}
Jenkins_User{
int id PK
str name
str lan_id
}
Cause{
int id PK
str category
}
Build{
int id PK
int job_id FK
int build_number
int result_id FK
int user_id FK
int trigger_type FK "Cause table's PK"
int duration
datetime timestamp
int upstream_job_id FK
int upstream_build_number
int upstream_type FK "Cause table's PK"
int previous_build_number
}
ParameterDictionary{
int id PK
str name
}
Parameter{
int build_id FK
int name_id FK
str value
}
Artifact{
int id PK
int build_id FK
str file_name
str dir
int size
datetime timestamp
}
Example
Jenkins Build extractor
Following examples try to emulate the file structure aboved.
1. Extract build
Extracting a multi-branch pipeline
from duck_jenkins import JenkinsData
jd = JenkinsData(
domain_name='jenkins1.example.io',
verify_ssl=False,
user_id='C001',
secret='elwerqoqiweucv',
data_directory='data'
)
jd.pull(
project_name='pipeline2/master',
build_number=1,
artifact=True
)
2. Extract upstream build
Let assume the upstream of pipeline2/master/1 is pipeline1/1.
from duck_jenkins import JenkinsData
jd = JenkinsData(
domain_name='jenkins1.example.io',
verify_ssl=False,
user_id='C001',
secret='elwerqoqiweucv',
data_directory='data'
)
jd.pull_upstream(
project_name='pipeline2/master',
build_number=1,
artifact=False
)
3. Extract previous build
from duck_jenkins import JenkinsData
jd = JenkinsData(
domain_name='jenkins1.example.io',
verify_ssl=False,
user_id='C001',
secret='elwerqoqiweucv',
data_directory='data'
)
jd.pull_previous(
project_name='pipeline2/master',
build_number=2, # build 2 is excluded from the extraction in this function.
artifact=True,
overwrite=True,
size=1 # say, you only interested 1 previous build.
)
DuckDB transformation
Without transform into a database, it is useless. Following steps demostrate how to import into DuckDB.
from duck_jenkins import DuckLoader
import duckdb
db = duckdb.connect('1.ddb')
cursor = db.cursor()
dl = DuckLoader(cursor, 'data')
dl.import_into_db(
jenkins_domain_name='jenkins1.example.io',
overwrite=False # False to skip insert for existing record.
)
cursor.commit()
cursor.close()
For more usage of DuckDB, visit the official document:
https://duckdb.org/docs/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file duck-jenkins-0.0.25.tar.gz.
File metadata
- Download URL: duck-jenkins-0.0.25.tar.gz
- Upload date:
- Size: 12.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d079fdd23ae9a41b2206d79471bb835175b1145e141f5a1761ed3e548972ad28
|
|
| MD5 |
3271cd7a6e93b5e250492875074d4923
|
|
| BLAKE2b-256 |
31fbc70254767d20fb85cc0341bd80aeb6fbf103276cde03808f55eb9692ccf5
|
File details
Details for the file duck_jenkins-0.0.25-py3-none-any.whl.
File metadata
- Download URL: duck_jenkins-0.0.25-py3-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6015281b5334825778f0911eb0965ef0242bd55ccb522b5630e17e90e51a5a7
|
|
| MD5 |
4865cacfea812f2f7e360467ea4c2256
|
|
| BLAKE2b-256 |
eb58c7702801ae7c4e52f9933f8dab735fc8aec931c285499ff81e1821a2e379
|