Pythonic data engine for Texas education datasets

These details have not been verified by PyPI

Project links

Project description

TEA Data Engine (`teadata`)

teadata is a snapshot-first Python engine for Texas education data. It provides:

District and Campus domain models
a fluent query DSL using >>
geospatial lookups (nearest charter, campuses in district boundaries, private-school overlap)
config-driven enrichment from TAPR, accountability, transfers, PEIMS financials, and closure datasets
sidecar sqlite stores for fast boundary/map/entity lookup

Installation

PyPI

pip install teadata

Development (recommended)

git clone https://github.com/adpena/teadata.git
cd teadata
uv sync --all-extras

Quick Start

from teadata import DataEngine

# Preferred runtime path: load the latest discovered snapshot.
engine = DataEngine.from_snapshot(search=True)

# District lookup by district number, campus number, or name.
aldine = engine.get_district("101902")
print(aldine.name)

# Campuses physically inside district boundaries.
for campus in aldine.campuses[:5]:
    print(campus.name, campus.campus_number)

Public API Surface

Primary imports:

from teadata import DataEngine, District, Campus

Core behaviors:

DataEngine.from_snapshot(...) supports .pkl and .pkl.gz snapshots and multiple payload shapes.
Snapshot discovery checks explicit paths, env vars, package .cache, and parent .cache directories.
District and Campus support dynamic metadata attributes through meta.
Campus.to_dict() always includes percent_enrollment_change (numeric when available, otherwise "N/A").

Snapshot and Asset Behavior

teadata is intentionally cache-first.

Artifacts typically used at runtime:

repo_*.pkl / repo_*.pkl.gz (engine snapshot)
boundaries_*.sqlite (boundary WKB sidecar)
map_payloads_*.sqlite (map payload sidecar)
entities_*.sqlite (entity lookup sidecar)

If snapshot/store files are Git LFS pointers or missing locally, runtime asset resolvers can fetch real files when URL env vars are provided.

Environment Variables

TEADATA_SNAPSHOT: explicit snapshot path.
TEADATA_SNAPSHOT_URL: URL used when snapshot candidate is missing or a Git LFS pointer.
TEADATA_BOUNDARY_STORE: explicit boundary sqlite path.
TEADATA_BOUNDARY_STORE_URL: URL fallback for boundary store.
TEADATA_MAP_STORE: explicit map sqlite path.
TEADATA_MAP_STORE_URL: URL fallback for map store.
TEADATA_ENTITY_STORE: explicit entity sqlite path.
TEADATA_ENTITY_STORE_URL: URL fallback for entity store.
TEADATA_ASSET_CACHE_DIR: override cache directory used for downloaded assets.
TEADATA_DISABLE_INDEXES: disable default spatial acceleration indexes.
TEADATA_LOG_MEMORY: enable memory snapshot logging.

Query DSL

DataEngine and Query chains use >>.

# Resolve district then expand to district-operated campuses.
q = engine >> ("district", "ALDINE ISD") >> ("campuses_in",)

# Filter, sort, and take.
top = (
    q
    >> ("filter", lambda c: (c.enrollment or 0) > 1000)
    >> ("sort", lambda c: c.enrollment or 0, True)
    >> ("take", 10)
)

rows = top.to_df(columns=["name", "campus_number", "enrollment"])

Supported lookup semantics include:

case-insensitive district and campus name matching
wildcard patterns (*, ?, SQL-like %/_)
normalized district number handling (for example "123" and "'000123")

Spatial and transfer helpers include:

nearest-campus/nearest-charter queries
nearest_charter_same_type(...)
transfer graph methods such as transfers_out(...) / transfers_in(...)

Enrichment Pipeline

teadata/enrichment provides registered enrichers for district and campus datasets.

Included enrichers cover:

district accountability and district TAPR profile data
campus accountability, TAPR profile/historical enrollment, PEIMS financials
planned closure overlays
charter network augmentation

Pipeline behavior is fault-tolerant by design: dataset-level failures are generally logged and do not hard-stop the full build.

Data Build Pipeline

teadata/load_data.py builds a full DataEngine and updates cached artifacts.

uv run python -m teadata.load_data

At a high level, it:

resolves year-aware source paths from teadata/teadata_sources.yaml
warm-loads compatible snapshot cache when signatures match
otherwise builds districts/campuses from spatial files
applies enrichment datasets
writes snapshot + sqlite sidecars back to .cache/

Config and CLI (`teadata-config`)

teadata/teadata_config.py provides YAML/TOML config loading, year resolution, schema checks, and dataset joins.

CLI entrypoint:

uv run teadata-config --help

Subcommands:

init <out.yaml>
resolve <cfg> <section> <dataset> <year>
report <cfg> [--json] [--min N] [--max N]
join <cfg> <year> [--datasets a,b,c] [--parquet out.parquet] [--duckdb out.duckdb --table t]

Testing

uv run pytest

Current tests cover:

snapshot gzip and fallback loading
query DSL semantics and chaining
nearest charter behavior and transfer grouping
store discovery and asset-cache behavior
entity serialization invariants (percent_enrollment_change)

PyPI Size Limits and Current Packaging Status

PyPI defaults currently documented at:

per-file upload limit: 100 MB
total project limit: 10 GB

Reference: https://docs.pypi.org/project-management/storage-limits/

Current teadata release artifacts for 0.0.118 are above the per-file limit:

wheel: dist/teadata-0.0.118-py3-none-any.whl about 448 MB
sdist: dist/teadata-0.0.118.tar.gz about 446 MB

These exceed the default 100 MB file cap because large .cache snapshot/store artifacts are packaged into both distributions.

Release Policy

Versioning uses thousandths-place tags (v0.0.101, v0.0.102, ...).
Keep only the 3 most recent release tags/assets.

License

Business Source License 1.1. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.118

Feb 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

teadata-0.0.118.tar.gz (75.7 MB view details)

Uploaded Feb 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

teadata-0.0.118-py3-none-any.whl (77.6 MB view details)

Uploaded Feb 24, 2026 Python 3

File details

Details for the file teadata-0.0.118.tar.gz.

File metadata

Download URL: teadata-0.0.118.tar.gz
Upload date: Feb 24, 2026
Size: 75.7 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for teadata-0.0.118.tar.gz
Algorithm	Hash digest
SHA256	`24e765ab055ef1f13eb8acdea1a66fc4fcf219a64fb5d6ac2dc64a1acde66acf`
MD5	`869f1a31fdac35a62bb5e7d7c860cfb9`
BLAKE2b-256	`90f8b5daa3a9f36991070b60913edc3e92622f0559b30461a8cac1bde604443d`

See more details on using hashes here.

File details

Details for the file teadata-0.0.118-py3-none-any.whl.

File metadata

Download URL: teadata-0.0.118-py3-none-any.whl
Upload date: Feb 24, 2026
Size: 77.6 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for teadata-0.0.118-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3f730148bf6e2c026163055b10f100e24eeb5c40516bdeed30e0d90581eeb336`
MD5	`d762c5c9153762a22bca0518de77a973`
BLAKE2b-256	`b4b56274de11e1264bb46e84b3cc12cf0044e962ac341c341246981139fcfb08`

See more details on using hashes here.

teadata 0.0.118

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

TEA Data Engine (teadata)

Installation

PyPI

Development (recommended)

Quick Start

Public API Surface

Snapshot and Asset Behavior

Environment Variables

Query DSL

Enrichment Pipeline

Data Build Pipeline

Config and CLI (teadata-config)

Testing

PyPI Size Limits and Current Packaging Status

Release Policy

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

TEA Data Engine (`teadata`)

Config and CLI (`teadata-config`)