Skip to main content

A comprehensive Python utilities package with enhanced auto-discovery

Project description

Siege Utilities

Python 3.11–3.14 License: AGPLv3%20or%20Commercial Documentation

siege_utilities is the shared utilities library behind Siege Analytics workflows:

  • Geospatial + GeoDjango boundary/data services (tiered: [geo-lite] / [geo] / [geodjango])
  • Google Workspace write APIs (Sheets, Docs, Slides, Drive) with multi-account management
  • Census API/data selection/crosswalk tooling
  • Isochrone analysis with configurable CRS and domain exceptions
  • Configuration and profile management
  • Distributed processing helpers (Spark/HDFS/Databricks)
  • Reporting and chart generation

Related: Siege Analytics ZSH Configuration

siege_utilities works alongside siege_analytics_zshrc — a modular ZSH configuration system for data engineering environments. Together they form a two-part toolchain:

  • siege_utilities — Python library: geospatial, reporting, analytics, Django models, distributed computing
  • siege_analytics_zshrc — Shell environment: Java/Spark/Hadoop/Python version management, credential handling, cluster connectivity

They are designed to work together (the ZSH config sets up SPARK_HOME, JAVA_HOME, pyenv, and credential paths that siege_utilities expects) but each can be used independently. You don't need the ZSH config to use the Python library, and vice versa.

Python Version Support

Version Status CI
3.11 Fully supported (floor) Required pass
3.12 Fully supported Required pass
3.13 Supported Allow-failure while stabilizing
3.14 Experimental Not yet in CI (awaiting ecosystem wheels)

The library requires Python 3.11+. Geospatial extras ([geo], [geodjango]) depend on C-extension packages (GDAL/GEOS/PROJ bindings) whose wheel availability varies by Python version — check PyPI for your target version before installing.

Install

See Installation Options for all supported install commands (base, geo-lite, geo, geodjango, all).

Quick Usage

import siege_utilities as su

su.log_info("Ready.")
recommendations = su.select_census_datasets("demographics", "tract")

Error Handling

siege_utilities follows a fail-loud-over-silent-swallow policy: when a function cannot deliver its documented output, it raises a typed exception rather than returning None, False, or an empty container that looks like a legitimate "no result." This distinguishes real failures from expected empty-input paths and prevents silent data corruption in downstream pipelines.

Key exception types by subsystem:

Subsystem Exception Parent Raised When
reporting (top-level) ReportingConfigError RuntimeError export / import config fails
reporting.chart_types UnknownChartTypeError LookupError chart type not in registry
reporting.chart_types ChartParameterError ValueError required params missing / bad
reporting.chart_types ChartCreationError RuntimeError create_function failed
reporting.client_branding ClientBrandingNotFoundError LookupError named client has no config
reporting.client_branding ClientBrandingError RuntimeError I/O or YAML parse failure
geo.census_geocoder CensusGeocodeError RuntimeError Census API/network failure
geo.spatial_data SpatialDataError RuntimeError portal dataset / OSM failure
geo.boundary_result BoundaryRetrievalError (+ 6 subclasses) Exception boundary lookup problems

All new exceptions use raise ... from e chaining — inspect exc.__cause__ to see the underlying error. Because the exception types subclass standard Python exceptions, broad existing except ValueError: / except LookupError: handlers continue to work.

See docs/FAILURE_MODES.md for the catalog of silent-swallow patterns this library has eliminated and the migration guidance for callers that relied on the old return-value behavior.

Import Philosophy

This project intentionally favors convenience access patterns, including broad function availability from the package surface. That is a design choice, not an accident.

Contributor rule: convenience imports are acceptable in explicit API-aggregation surfaces, but implementation modules should prefer explicit imports to avoid hidden collisions and reduce regression risk.

Contributor Requirements

Every PR must include:

  • Tests for changed behavior (and regression test for bug fixes)
  • Documentation updates
  • Notebook updates when user-facing workflows or APIs change
  • CodeRabbit feedback addressed for correctness/regression/API-risk findings
  • Required CI/PR checks green (including CodeRabbit status once enabled)

Pre-PR Validation Commands

# Test naming/location hygiene
python scripts/check_test_file_hygiene.py

# API contract tooling regression check
python scripts/contracts/generate_public_api_contract.py --output /tmp/contract_candidate.json
python scripts/contracts/compare_public_api_contracts.py \
  --baseline /tmp/contract_baseline.json \
  --candidate /tmp/contract_candidate.json \
  --release-impact <patch|minor|major> \
  --allowlist scripts/contracts/contract_allowlist.json

# Contract-tool unit tests
python -m pytest -q --no-cov tests/test_api_contract_tools.py

If a PR intentionally adds public API symbols, classify as minor and update scripts/contracts/contract_allowlist.json in the same PR.

See:

  • docs/policies/CODING_STYLE.md
  • docs/policies/PR_REVIEW_RUBRIC.md
  • docs/policies/CHANGE_CLASSIFICATION_AND_RELEASE_POLICY.md
  • docs/policies/CONTRIBUTOR_GOVERNANCE.md
  • docs/RELEASE_LINEAGE.md
  • docs/EXAMPLES.md
  • docs/ISOCHRONES_AND_WKLS.md
  • docs/MANAGED_ENVIRONMENTS.md
  • docs/INTENT.md — per-module purpose + divergence catalog (ELE-2416)
  • docs/FAILURE_MODES.md — cross-cutting anti-pattern catalog (ELE-2418)
  • docs/TEST_UPGRADES.md — test-quality patterns and coverage scorecard (ELE-2419)
  • docs/ARCHITECTURE.md + docs/adr/ — three-layer model and ADRs (ELE-2417)
  • docs/NOTEBOOKS.md — notebook inventory and consolidation plan (ELE-2421)

External Contributor Workflow

Use this path when contributing from a fork:

  1. Fork this repository on GitHub, then clone your fork:
git clone https://github.com/<your-user>/siege_utilities.git
cd siege_utilities
git remote add upstream https://github.com/siege-analytics/siege_utilities.git
  1. Create and activate a local virtual environment, then install from the cloned repo:
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
  1. Validate notebooks and notebook outputs:
python -m pytest -q --no-cov tests/test_notebooks_output_policy.py

If your change updates user-facing workflows or APIs, update the impacted notebooks and ensure notebooks/output/ artifacts remain reviewable.

  1. Open an issue in siege-analytics/siege_utilities describing the change for merge review, link your fork branch/PR, and include:
  • Reproduction or motivation
  • Proposed change scope
  • Test evidence
  • Documentation and notebook updates

GeoDjango Integration

Full spatial data platform with 37 concrete models, 9 population services, and 7 management commands.

Model Hierarchy

TemporalGeographicFeature (abstract — no geometry)
├── TemporalBoundary (abstract — MultiPolygon)
│   ├── CensusTIGERBoundary (abstract — GEOID + TIGER metadata)
│   │   ├── State, County, Tract, BlockGroup, Block, Place, ZCTA
│   │   ├── CongressionalDistrict, CBSA, UrbanArea
│   │   ├── StateLegislativeUpper, StateLegislativeLower, VTD, Precinct
│   │   └── SchoolDistrictElementary, Secondary, Unified
│   ├── GADMBoundary → GADMCountry, GADMAdmin1-5
│   ├── NLRBRegion, FederalJudicialDistrict
│   ├── NCESLocaleBoundary, TimezoneGeometry
│   └── Intersections (County×CD, VTD×CD, Tract×CD)
├── TemporalLinearFeature (abstract — MultiLineString)
└── TemporalPointFeature (abstract — Point)
    └── SchoolLocation

Spatial Queries

from django.contrib.gis.geos import Point
from siege_utilities.geo.django.models import Tract, County, State

# Find tract containing a point
point = Point(-122.4194, 37.7749, srid=4326)
tract = Tract.objects.containing_point(point).for_year(2020).first()

# Nearest boundaries within distance (meters)
nearby = County.objects.nearest(point, max_distance_m=50_000)

# Temporal + spatial filtering
counties_2020 = County.objects.for_state("06").for_year(2020)

Management Commands

# Census TIGER/Line boundaries
python manage.py populate_boundaries --year 2020 --type county --state CA

# Demographics from ACS
python manage.py populate_demographics --year 2020 --dataset acs5 --variables B19013_001

# PL 94-171 redistricting data
python manage.py populate_pl_demographics --year 2020 --state CA

# Boundary crosswalks (2010 → 2020)
python manage.py populate_crosswalks --source-year 2010 --target-year 2020

# NCES school district + locale data
python manage.py populate_nces --year 2020

# NLRB region boundaries
python manage.py populate_nlrb_regions --year 2024

# Timezone boundaries (from timezone-boundary-builder)
python manage.py populate_timezones --file timezones.geojson --year 2024

Services

Service Purpose
BoundaryPopulationService Load TIGER/Line shapefiles into boundary models
DemographicPopulationService Fetch ACS/Decennial data into DemographicSnapshot
CrosswalkPopulationService Build boundary change crosswalks between vintages
TimeseriesService Auto-populate DemographicTimeSeries from snapshots
DemographicRollupService Aggregate child geographies to parents (GEOID or crosswalk)
UrbanicityClassificationService Classify tracts by NCES urbanicity codes
NCESPopulationService Load school districts, locales, and school locations
NLRBPopulationService Populate NLRB region boundaries
TimezonePopulationService Load IANA timezone geometries from GeoJSON

Demographics & Rollups

from siege_utilities.geo.django.models import DemographicSnapshot, DemographicTimeSeries
from siege_utilities.geo.django.services import DemographicRollupService

# Query demographics
snapshots = DemographicSnapshot.objects.filter(
    content_type__model='tract',
    dataset='acs5',
    year=2020,
)

# Roll up tract data to county level
svc = DemographicRollupService()
results = svc.rollup(
    source_level='tract',
    target_level='county',
    year=2020,
    variables=['B19013_001', 'B01003_001'],
    state_fips='06',
    min_coverage=0.8,  # warn if <80% of child geographies have data
)

# Crosswalk-aware rollup (handles boundary changes)
results = svc.rollup(
    source_level='tract',
    target_level='county',
    year=2020,
    variables=['B01003_001'],
    crosswalk_year=2010,  # map 2010 tracts to 2020 counties via crosswalk
)

Census Data Intelligence

Consolidated Census metadata registry with intelligent dataset selection.

from siege_utilities.config.census_registry import (
    SurveyType, GeographyLevel, resolve_geographic_level,
    VARIABLE_GROUPS, CANONICAL_GEOGRAPHIC_LEVELS,
)
from siege_utilities.geo import quick_census_selection

# Resolve geography aliases
level = GeographyLevel("congressional_district")  # resolves alias → "cd"

# Quick selection for analysis
result = quick_census_selection("business", "county")
print(f"Use {result['recommendations']['primary_recommendation']['dataset']}")

# Census API with caching
from siege_utilities.geo import CensusAPIClient

client = CensusAPIClient(cache_backend='django')  # or 'sqlite', 'memory'
data = client.get_acs5(
    year=2020,
    variables=['B19013_001'],
    geography='tract',
    state='06',
)

Census API Client

Direct access to Census Bureau data with built-in caching and rate limiting.

from siege_utilities.geo import CensusAPIClient

client = CensusAPIClient(api_key="your-key")

# ACS 5-Year estimates
median_income = client.get_acs5(
    year=2020,
    variables=['B19013_001', 'B01003_001'],
    geography='county',
    state='06',
)

# PL 94-171 redistricting data
from siege_utilities.geo.census_files.pl_downloader import PLFileDownloader

downloader = PLFileDownloader()
pl_data = downloader.download_state("CA", year=2020)

Hydra + Pydantic Configuration

from siege_utilities.config import HydraConfigManager

with HydraConfigManager() as manager:
    user_profile = manager.load_user_profile()
    branding = manager.load_branding_config("client_a")
    db_connections = manager.load_database_connections("client_a")

Reporting & Visualization

from siege_utilities.reporting import ReportGenerator

report_gen = ReportGenerator(client_name="Demo Company")

report_content = {
    "metadata": {"title": "Analytics Summary"},
    "sections": [{"type": "text", "title": "Overview", "content": "Report summary."}],
}
report_gen.generate_pdf_report(report_content, output_path="report.pdf")

Capabilities: 7+ map types (choropleth, marker, 3D, heatmap, cluster, flow), PDF reports with TOC, PowerPoint generation, GA geographic analysis with Census demographic joins.

Function Categories

Category Count Description Dependencies
Core 16 Logging, strings, basic utils None
Config 54 Database, project, client setup None
Files 21 File ops, paths, remote downloads None
Distributed 37 Spark utilities, HDFS operations PySpark
Geo 65+ Census data, boundaries, spatial, GeoDjango pandas, geopandas
Analytics 45+ Google Analytics, Workspace (Sheets/Docs/Slides), Snowflake pandas, google-api-python-client
Reporting 30+ Charts, maps, GA reports, PDF generation matplotlib, reportlab
Testing 15 Environment setup, test runners None
Git 9 Branch ops, commit management None
Development 9 Architecture analysis, code hygiene None
Hygiene 5 Docstring generation, analysis None
Data 3 Sample data utilities pandas

Installation Options

# Core only (pyyaml, requests, tqdm, pydantic)
pip install siege-utilities

# Add extras for what you need
pip install siege-utilities[geo-lite]         # shapely, pyproj, geopy (no GDAL needed)
pip install siege-utilities[geo]              # geo-lite + geopandas, fiona, rtree, tobler (needs GDAL)
pip install siege-utilities[geodjango]        # geo + Django, DRF, PostGIS
pip install siege-utilities[data]             # pandas, numpy, openpyxl, faker
pip install siege-utilities[reporting]        # matplotlib, seaborn, folium, plotly, reportlab
pip install siege-utilities[analytics]        # GA4, Facebook, Snowflake, scipy, scikit-learn
pip install siege-utilities[distributed]      # PySpark, Apache Sedona
pip install siege-utilities[config-extras]    # Hydra, hydra-zen, omegaconf
pip install siege-utilities[web]              # BeautifulSoup, lxml
pip install siege-utilities[database]         # SQLAlchemy, psycopg2
pip install siege-utilities[all]              # Everything

# Combine extras
pip install siege-utilities[data,geo,reporting]

# Development
git clone https://github.com/siege-analytics/siege_utilities.git
cd siege_utilities
pip install -e ".[all,dev]"

Testing

1884 tests across all modules.

# Full suite
python -m pytest tests/ -v

# By marker
python -m pytest tests/ -m core
python -m pytest tests/ -m geo
python -m pytest tests/ -m "not requires_gdal"

# Quick smoke test
python -m pytest tests/ --tb=short -q

Architecture

siege_utilities/
├── config/              # Census registry, Hydra/Pydantic configs, client management
│   ├── census_registry.py   # Single source of truth for Census metadata
│   └── ...
├── geo/                 # Geospatial: Census API, GEOID utils, geocoding, spatial ops
│   ├── census_api_client.py
│   ├── census_files/    # PL 94-171, TIGER/Line downloaders
│   └── django/          # GeoDjango integration
│       ├── models/      # 37 concrete models (boundaries, demographics, crosswalks)
│       ├── services/    # 9 population services
│       ├── management/  # 7 management commands
│       ├── managers/    # Custom querysets (containing_point, nearest, for_year)
│       └── serializers/ # DRF GeoJSON serializers
├── distributed/         # Spark, HDFS, Databricks utilities
├── reporting/           # PDF, PowerPoint, choropleth, GA reports
├── analytics/           # GA4, Google Workspace (Sheets/Docs/Slides), Snowflake
├── files/               # File operations, hashing, remote downloads
├── core/                # Logging, string utilities
└── development/         # Architecture analysis, package management

Documentation

Contributing

See CONTRIBUTING.md for the full guide: fork, clone, install locally, run tests, and submit a PR.

Quick version:

git clone https://github.com/<your-user>/siege_utilities.git
cd siege_utilities
python3.11 -m venv .venv && source .venv/bin/activate
pip install -e ".[all,dev]"
python -m pytest tests/ -v

License

Dual license model (effective March 6, 2026):

  • AGPL-3.0-only for open-source usage
  • Commercial license for proprietary/commercial usage by separate agreement

Attribution is required in both paths. See LICENSE, LICENSES/AGPL-3.0.txt, and COMMERCIAL_LICENSE.md.


Siege Utilities: Spatial Intelligence, In Python.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

siege_utilities-3.17.0.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

siege_utilities-3.17.0-py3-none-any.whl (1.3 MB view details)

Uploaded Python 3

File details

Details for the file siege_utilities-3.17.0.tar.gz.

File metadata

  • Download URL: siege_utilities-3.17.0.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.11

File hashes

Hashes for siege_utilities-3.17.0.tar.gz
Algorithm Hash digest
SHA256 3a773431a99e97f68f3d157900bd2ef810c62c3ad41759df3699b2bd24729455
MD5 6fe4ebdb950cf0e7d17435752c87c363
BLAKE2b-256 41a825a4a6f3439be4df7893727320c9064be6be5264ca29260ffeb8e652d13b

See more details on using hashes here.

File details

Details for the file siege_utilities-3.17.0-py3-none-any.whl.

File metadata

File hashes

Hashes for siege_utilities-3.17.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2f009db943758674d004b66f397056a92f9ead4785afd3a31ae64fe72753d13a
MD5 4c6a81a7849d45ae7f7ce20043f2aa60
BLAKE2b-256 7ebde41c6a0b8488f8f7e19bf8ad59ba17b3db6604f48d240ec7e67e1ea88bcc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page