Skip to main content

Cleans personnel CSV exports into a standalone SQLite database.

Project description

deped-hr

deped-hr cleans personnel CSV exports into a standalone SQLite database. It depends on a lookup artifact from deped-dcp-template and an entity artifact from deped-entity.

What This Package Owns

  • Personnel row cleaning and normalization
  • Entity loading for the personnel domain
  • Position title seeding from lookups.db
  • Personnel-focused views and audit outputs

This package is the source of truth for the personnel database contract only.

Inputs And Outputs

Required inputs:

  • a personnel CSV export; this needs to be placed under /data
  • a lookup database produced by deped-template
  • an entities.db artifact produced by deped-entity

Primary outputs:

  • personnel.db
  • unmapped_positions.txt

personnel.db includes normalized personnel rows, entity rows for the covered scope, date parse errors, seeded position titles, and personnel-owned views such as school/personnel summaries and quality issue rollups.

When artifacts/entities.db contains multiple rows with the same natural_key, deped-hr keeps the most recent row by date_time_submitted. If timestamps tie, the higher entity_id wins.

CLI

Build the lookup database:

uv run deped-dcp-template extract \
  --templates-dir templates \
  --output artifacts/lookups.db

Build the entity database:

uv run entity build \
  --input data/entities.xlsx \
  --db artifacts/entities.db

Build the personnel database using artifacts/lookups.db, artifacts/entities.db, and the personnel csv file, e.g. data/2026-03-31-personnel.csv:

uv run hr build \
  --personnel data/2026-03-31-personnel.csv \
  --lookups artifacts/lookups.db \
  --entities artifacts/entities.db \
  --db artifacts/personnel.db

Run the lightweight audit command:

uv run hr audit \
  --db artifacts/personnel.db

Nuances

  • Position normalization is intentionally lookup-driven. If a raw title fails to map cleanly, it is recorded in unmapped_positions.txt rather than silently coerced.
  • The build also applies cleaning beyond normalization: it parses dates, repairs or blanks invalid emails, normalizes phone numbers, reconstructs missing full names, normalizes employee identifiers, and emits quality flags for invalid or suspicious source values.
  • Many of these rules exist because personnel sheets are manually encoded and routinely contain wrong-column values, malformed DepEd domains, spreadsheet formatting corruption, and misfiled separation or position text. The detailed rationale and examples live in docs/data-contract.md.
  • Name and ID handling is also normalization-driven: person-name parts are cleaned before storage, full_name is rebuilt from cleaned fields rather than trusting the raw CSV full-name column, and malformed-but-present employee IDs are preserved but flagged while only usable numeric suffixes are allowed into full_name.
  • Email normalization also reconciles DepEd addresses across deped_email and personal_email: matching DepEd values collapse to one canonical deped_email, personal-field DepEd addresses can be promoted into deped_email, malformed DepEd-like domains are repaired when the intent is clear, and conflicting DepEd addresses are flagged for audit while personal_email is cleared so DepEd addresses do not remain in that column.
  • Several high-noise personnel fields are normalized to closed canonical sets: oic_designation, source_of_funds, ro_office, and sdo_office. Unknown or polluted values are blanked rather than preserved as free text.
  • Office-unit reporting should come from normalized personnel data and v_office_personnel_summary, not from the entity dimension.
  • Entity derivation and natural-key generation belong to deped-entity. This package imports entities.db, collapses duplicate natural_key rows to the most recent submitted value, and links personnel rows against the imported entities.
  • entities.db is treated as the source of entity metadata, but not as a uniqueness guarantee on natural_key. If upstream emits repeated keys, the newest timestamped row becomes the local entity record for that key.
  • The package creates only personnel-owned views. Cross-domain analytical surfaces from the retired monolith are not part of this contract.
  • Equipment-specific assumptions do not belong here. If another package needs personnel data, it should consume the personnel.db artifact instead of importing this package.

Tests

Run the package tests from this directory:

uv run pytest -q

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deped_hr-0.2.4.tar.gz (140.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deped_hr-0.2.4-py3-none-any.whl (25.7 kB view details)

Uploaded Python 3

File details

Details for the file deped_hr-0.2.4.tar.gz.

File metadata

  • Download URL: deped_hr-0.2.4.tar.gz
  • Upload date:
  • Size: 140.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for deped_hr-0.2.4.tar.gz
Algorithm Hash digest
SHA256 98581936ef2ba61de3e253c432ef5cf62d40d22164d6ec947b6d2e72882c8b72
MD5 85e116c8f3701f5ee70722b415d703f9
BLAKE2b-256 ac0a82192a2b43df618e726d649a883bfc54f39b3dad3d9ee454b2f3f7368b9e

See more details on using hashes here.

File details

Details for the file deped_hr-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: deped_hr-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 25.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for deped_hr-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 7d3762718b10312e918110265a8893f6ee9c97b8f3b582d42aaa119a75d78b48
MD5 f590769350c5b1819b4ab36933678625
BLAKE2b-256 a9ef762ebdb24af8287c3981c17eb68a6cd50636826a28aae50711301bf6d287

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page