Jenkins build data ETL with DuckDB
Project description
Duck Jenkins: loading jenkins build info into DuckDB
What is it?
ETL(Extract Transform Load) for Jenkins data.
Installation
pip install duck-jenkins
Main features
Jenkins build extractor
- Extract and serialize Jenkins' build information along with artefact metadata into files.
- A fix file structure can support multiple Jenkins servers.
- Support multi-branch structure
└── data
├── jenkins1.example.io
└── jenkins2.example.io
├── pipeline1
│ └── 1_info.json
└── pipeline2
└── master
├── 1_info.json
└── 1_artifact.csv
DuckDB transformer
Transform all serialized data above to relational database, DuckDB.
Database Schemas
- Jenkins
- id: int
- domain_name: str
- Job
- id: int
- name: str
- jenkins_id: int
- Result
- id: int
- name: str
- User
- id: int
- name: str
- lan_id: str
- Cause
- id: int
- category: str
- Build
- id: int
- job_id: int
- build_number: int
- result_id: int
- user_id: int
- trigger_type: int
- duration: int
- timestamp: datetime
- parameter_id: int
- upstream_job_id: int
- upstream_build_number: int
- upstream_type: int
- previous_build_number: int
- ParameterDictionary
- id: int
- name: str
- Parameter
- build_id: int
- name_id: int
- value: str
- Artifact
- id: int
- build_id: int
- file_name: str
- dir: str
- size: int
- timestamp: datetime
Example
Jenkins Build extractor
Following examples try to emulate the file structure aboved.
1. Extract build
Extracting a multi-branch pipeline
from duck_jenkins import JenkinsData
jd = JenkinsData(
domain_name='jenkins1.example.io',
verify_ssl=False,
user_id='C001',
secret='elwerqoqiweucv',
data_directory='data'
)
jd.pull(
project_name='pipeline2/master',
build_number=1,
artifact=True
)
2. Extract upstream build
Let assume the upstream of pipeline2/master/1
is pipeline1/1
.
from duck_jenkins import JenkinsData
jd = JenkinsData(
domain_name='jenkins1.example.io',
verify_ssl=False,
user_id='C001',
secret='elwerqoqiweucv',
data_directory='data'
)
jd.pull_upstream(
project_name='pipeline2/master',
build_number=1,
artifact=False
)
3. Extract previous build
from duck_jenkins import JenkinsData
jd = JenkinsData(
domain_name='jenkins1.example.io',
verify_ssl=False,
user_id='C001',
secret='elwerqoqiweucv',
data_directory='data'
)
jd.pull_previous(
project_name='pipeline2/master',
build_number=2, # build 2 is excluded from the extraction in this function.
artifact=True,
overwrite=True,
size=1 # say, you only interested 1 previous build.
)
DuckDB transformation
Without transform into a database, it is useless. Following steps demostrate how to import into DuckDB.
from duck_jenkins import DuckLoader
import duckdb
db = duckdb.connect('1.ddb')
cursor = db.cursor()
dl = DuckLoader(cursor, 'data')
dl.import_into_db(
jenkins_domain_name='jenkins1.example.io',
overwrite=False # False to skip insert for existing record.
)
cursor.commit()
cursor.close()
For more usage of DuckDB
, visit the official document:
https://duckdb.org/docs/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
duck-jenkins-0.0.18.tar.gz
(12.2 kB
view hashes)
Built Distribution
Close
Hashes for duck_jenkins-0.0.18-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | af400fbf113f28d8b612dd6ea8c384b0e3020307a4d1c0637eefddf6af979309 |
|
MD5 | a31dd8892b1f8bc41a1103087fb77df0 |
|
BLAKE2b-256 | 10a89241d68db23d5d3df92267f323eadadc4a60982451da489e72db77dec1d4 |