Skip to main content

Shared runtime helpers for DepEd DCP data-cleaning packages.

Project description

deped-dcp-template

deped-dcp-template is the shared template and normalization layer extracted from the older DepEd DCP monolith pipeline.

That earlier pipeline ingested three operational CSV sources, personnel, equipment, and connectivity, into one combined SQLite database. To make that work, it had to repair the messy freeform values submitted by schools, divisions, and regions: entity extraction, position normalization, equipment dimension canonicalization, person-link resolution, connectivity staging/promotion, and audit outputs such as unmapped_positions.txt and equipment_dimension_issues.txt.

The root cause lives in the Excel collection workbooks that DepEd distributed. This repository ships those three canonical v1.16 templates in templates/. They define the official value universe through sheets such as List of Positions, Referential Data, Regions, and SDOs, plus a Read Me sheet with fill instructions. In practice, many submitters typed freeform values instead of using the provided dropdowns, so downstream systems still need a shared cleanup layer anchored to the template catalogs.

Why This Package Exists

Downstream apps should not each:

  • copy the same template files
  • parse the same workbook sheets independently
  • carry slightly different position and equipment normalization rules
  • drift on what counts as the canonical DepEd value set

This package provides one reusable base layer so consumer packages can share the same template-derived lookups and baseline cleaning behavior.

What Happens Here

This repository currently does four things:

  • ships the canonical school, division, and region Excel templates
  • extracts their canonical lists into a SQLite lookup database
  • exposes shared normalization helpers for identifiers, dates, phone numbers, emails, positions, and selected equipment dimensions
  • exposes shared CSV/entity/personnel helper code used by downstream loaders

Running the extractor on the bundled v1.16 templates currently yields a lookup database with:

  • 314 canonical positions
  • 65 equipment items
  • 76 equipment brands
  • 18 regions
  • 222 unique region-division pairs

What Downstream Packages Still Own

This package is not the full monolith rebuild. Consumer packages still own:

  • end-to-end ingestion of the three operational source CSVs
  • domain-specific SQLite schemas, indexes, and views
  • person-link resolution policy beyond the shared helpers
  • connectivity staging and promotion workflows
  • final artifact contracts and audit-file emission

The shared helpers in this repository are the base layer those downstream flows build on.

Install

Published dependency:

uv add deped-dcp-template

Local development against a sibling checkout:

[tool.uv.sources]
deped-dcp-template = { path = "../deped-dcp-template" }

Build The Template Lookup DB

uv run deped-dcp-template extract \
  --templates-dir templates \
  --output artifacts/base.db

When installed as a dependency, templates resolves to the bundled workbook directory shipped inside the package. Downstream projects do not need to copy the Excel templates into their own repository just to run the extractor.

Inspect the generated database summary:

uv run deped-dcp-template show \
  --db artifacts/base.db

Profile a Table

profile-table inspects any SQLite table and produces a Markdown report covering type inference, null rates, dirty values, numeric statistics, and outlier detection (Tukey IQR fences).

CLI — installed package:

# all columns
profile-table path/to/data.db my_table

# selected columns only
profile-table path/to/data.db my_table --columns col_a,col_b

# save report
profile-table path/to/data.db my_table > report.md

CLI — local dev with uv:

uv run profile-table path/to/data.db my_table

justfile recipe (this repo):

just profile path/to/data.db my_table
just profile path/to/data.db my_table col_a,col_b

Python API:

import sqlite3
from deped_dcp_template.profile_table import _profile_column, _render_markdown

conn = sqlite3.connect("data.db")
conn.row_factory = sqlite3.Row
total = conn.execute('SELECT COUNT(*) FROM "my_table"').fetchone()[0]
profiles = [_profile_column(conn, "my_table", col, total) for col in ["col_a", "col_b"]]
print(_render_markdown("my_table", profiles))

The report includes:

Section What it shows
Summary table Type guess, avg/max length, null %, dirty %, outlier % per column
Column detail Per-column stats; numeric columns also get min/max/mean/median/Q1/Q3 and IQR fence bounds with extreme examples
Analysis Prose flags for high null rate, dirty values, outliers ≥ 5 %, very long text, mixed types, and low-cardinality text

Tests

uv run pytest -q

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deped_dcp_template-0.2.2.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deped_dcp_template-0.2.2-py3-none-any.whl (1.2 MB view details)

Uploaded Python 3

File details

Details for the file deped_dcp_template-0.2.2.tar.gz.

File metadata

  • Download URL: deped_dcp_template-0.2.2.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for deped_dcp_template-0.2.2.tar.gz
Algorithm Hash digest
SHA256 897cd9144b0d05be0825d528795f8ceadb92e3c9ff5b0586b8519395d5c1752f
MD5 86bd601fe4ddfffb4ff153b70540863d
BLAKE2b-256 ffefba8453a68456d964e22053e6709028d55d38469ce280563bb9ec71bd0fec

See more details on using hashes here.

File details

Details for the file deped_dcp_template-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: deped_dcp_template-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for deped_dcp_template-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2da64940c280e0653d01e9bb9a8050a1e84b7b0d36c7037c4982059497c03f83
MD5 40292babaa34ddcd268bfe45d3b068ca
BLAKE2b-256 35f974ba7e8bed31d3d9bf46c9f1ec7c950e0de8c82319a41dc586fff407ed69

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page