Cleans personnel CSV exports into a standalone SQLite database.
Project description
deped-hr
deped-hr cleans personnel CSV exports into a standalone SQLite database. It depends on a lookup artifact from deped-dcp-template and an entity artifact from deped-entity.
What This Package Owns
- Personnel row cleaning and normalization
- Entity loading for the personnel domain
- Position title seeding from
lookups.db - Personnel-focused views and audit outputs
This package is the source of truth for the personnel database contract only.
Inputs And Outputs
Required inputs:
- a personnel CSV export; this needs to be placed under
/data - a lookup database produced by
deped-template - an
entities.dbartifact produced bydeped-entity
Primary outputs:
personnel.dbunmapped_positions.txt
personnel.db includes normalized personnel rows, entity rows for the covered scope, date parse errors, seeded position titles, and personnel-owned views such as school/personnel summaries and quality issue rollups.
When artifacts/entities.db contains multiple rows with the same natural_key, deped-hr keeps the most recent row by date_time_submitted. If timestamps tie, the higher entity_id wins.
CLI
Build the lookup database:
uv run deped-dcp-template extract \
--templates-dir templates \
--output artifacts/lookups.db
Build the entity database:
uv run entity build \
--input data/entities.xlsx \
--db artifacts/entities.db
Build the personnel database using artifacts/lookups.db, artifacts/entities.db, and the personnel csv file, e.g. data/2026-03-31-personnel.csv:
uv run hr build \
--personnel data/2026-03-31-personnel.csv \
--lookups artifacts/lookups.db \
--entities artifacts/entities.db \
--db artifacts/personnel.db
Run the lightweight audit command:
uv run hr audit \
--db artifacts/personnel.db
Nuances
- Position normalization is intentionally lookup-driven. If a raw title fails to map cleanly, it is recorded in
unmapped_positions.txtrather than silently coerced. - The build also applies cleaning beyond normalization: it parses dates, repairs or blanks invalid emails, normalizes phone numbers, reconstructs missing full names, normalizes employee identifiers, and emits quality flags for invalid or suspicious source values.
- Many of these rules exist because personnel sheets are manually encoded and
routinely contain wrong-column values, malformed DepEd domains, spreadsheet
formatting corruption, and misfiled separation or position text. The detailed
rationale and examples live in
docs/data-contract.md. - Name and ID handling is also normalization-driven: person-name parts are
cleaned before storage,
full_nameis rebuilt from cleaned fields rather than trusting the raw CSV full-name column, and malformed-but-present employee IDs are preserved but flagged while only usable numeric suffixes are allowed intofull_name. - Email normalization also reconciles DepEd addresses across
deped_emailandpersonal_email: matching DepEd values collapse to one canonicaldeped_email, personal-field DepEd addresses can be promoted intodeped_email, malformed DepEd-like domains are repaired when the intent is clear, and conflicting DepEd addresses are flagged for audit whilepersonal_emailis cleared so DepEd addresses do not remain in that column. - Several high-noise personnel fields are normalized to closed canonical sets:
oic_designation,source_of_funds,ro_office, andsdo_office. Unknown or polluted values are blanked rather than preserved as free text. - Office-unit reporting should come from normalized personnel data and
v_office_personnel_summary, not from the entity dimension. - Entity derivation and natural-key generation belong to
deped-entity. This package importsentities.db, collapses duplicatenatural_keyrows to the most recent submitted value, and links personnel rows against the imported entities. entities.dbis treated as the source of entity metadata, but not as a uniqueness guarantee onnatural_key. If upstream emits repeated keys, the newest timestamped row becomes the local entity record for that key.- The package creates only personnel-owned views. Cross-domain analytical surfaces from the retired monolith are not part of this contract.
- Equipment-specific assumptions do not belong here. If another package needs personnel data, it should consume the
personnel.dbartifact instead of importing this package.
Tests
Run the package tests from this directory:
uv run pytest -q
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file deped_hr-0.2.2.tar.gz.
File metadata
- Download URL: deped_hr-0.2.2.tar.gz
- Upload date:
- Size: 73.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49d6024773c2811a2b1b62f2bb86d2731059c89fd40f60864bae7731de423372
|
|
| MD5 |
a202512f870a3bf3adcc7096b1bc7354
|
|
| BLAKE2b-256 |
abed6f2cbfa5519612813cfecfa440f7cc1436f87c484f2bfcb05478bfc266cd
|
File details
Details for the file deped_hr-0.2.2-py3-none-any.whl.
File metadata
- Download URL: deped_hr-0.2.2-py3-none-any.whl
- Upload date:
- Size: 22.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5fb6ea41d04213a6e7a814dc707bf7a553a46826caff4aa101af6c8ceb90d31b
|
|
| MD5 |
1ef477375a4b25fc25b085d6232447f8
|
|
| BLAKE2b-256 |
352c8a3f7978cb459871cec7006fd7453b9f8ac69a6e45267c66bc843450feae
|