A comprehensive Python utilities package with enhanced auto-discovery
Project description
Siege Utilities
siege_utilities is the shared utilities library behind Siege Analytics workflows:
- Geospatial + GeoDjango boundary/data services (tiered:
[geo-lite]/[geo]/[geodjango]) - Google Workspace write APIs (Sheets, Docs, Slides, Drive) with multi-account management
- Census API/data selection/crosswalk tooling
- Isochrone analysis with configurable CRS and domain exceptions
- Configuration and profile management
- Distributed processing helpers (Spark/HDFS/Databricks)
- Reporting and chart generation
Python Version Support
| Version | Status | CI |
|---|---|---|
| 3.11 | Fully supported (floor) | Required pass |
| 3.12 | Fully supported | Required pass |
| 3.13 | Supported | Allow-failure while stabilizing |
| 3.14 | Experimental | Not yet in CI (awaiting ecosystem wheels) |
The library requires Python 3.11+. Geospatial extras ([geo], [geodjango]) depend on C-extension packages (GDAL/GEOS/PROJ bindings) whose wheel availability varies by Python version — check PyPI for your target version before installing.
Install
See Installation Options for all supported install commands (base, geo-lite, geo, geodjango, all).
Quick Usage
import siege_utilities as su
su.log_info("Ready.")
recommendations = su.select_census_datasets("demographics", "tract")
Import Philosophy
This project intentionally favors convenience access patterns, including broad function availability from the package surface. That is a design choice, not an accident.
Contributor rule: convenience imports are acceptable in explicit API-aggregation surfaces, but implementation modules should prefer explicit imports to avoid hidden collisions and reduce regression risk.
Contributor Requirements
Every PR must include:
- Tests for changed behavior (and regression test for bug fixes)
- Documentation updates
- Notebook updates when user-facing workflows or APIs change
- CodeRabbit feedback addressed for correctness/regression/API-risk findings
- Required CI/PR checks green (including CodeRabbit status once enabled)
Pre-PR Validation Commands
# Test naming/location hygiene
python scripts/check_test_file_hygiene.py
# API contract tooling regression check
python scripts/contracts/generate_public_api_contract.py --output /tmp/contract_candidate.json
python scripts/contracts/compare_public_api_contracts.py \
--baseline /tmp/contract_baseline.json \
--candidate /tmp/contract_candidate.json \
--release-impact <patch|minor|major> \
--allowlist scripts/contracts/contract_allowlist.json
# Contract-tool unit tests
python -m pytest -q --no-cov tests/test_api_contract_tools.py
If a PR intentionally adds public API symbols, classify as minor and update
scripts/contracts/contract_allowlist.json in the same PR.
See:
docs/policies/CODING_STYLE.mddocs/policies/PR_REVIEW_RUBRIC.mddocs/policies/CHANGE_CLASSIFICATION_AND_RELEASE_POLICY.mddocs/policies/CONTRIBUTOR_GOVERNANCE.mddocs/RELEASE_LINEAGE.mddocs/EXAMPLES.mddocs/ISOCHRONES_AND_WKLS.mddocs/MANAGED_ENVIRONMENTS.md
External Contributor Workflow
Use this path when contributing from a fork:
- Fork this repository on GitHub, then clone your fork:
git clone https://github.com/<your-user>/siege_utilities.git
cd siege_utilities
git remote add upstream https://github.com/siege-analytics/siege_utilities.git
- Create and activate a local virtual environment, then install from the cloned repo:
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
- Validate notebooks and notebook outputs:
python -m pytest -q --no-cov tests/test_notebooks_output_policy.py
If your change updates user-facing workflows or APIs, update the impacted notebooks and ensure notebooks/output/ artifacts remain reviewable.
- Open an issue in
siege-analytics/siege_utilitiesdescribing the change for merge review, link your fork branch/PR, and include:
- Reproduction or motivation
- Proposed change scope
- Test evidence
- Documentation and notebook updates
GeoDjango Integration
Full spatial data platform with 37 concrete models, 9 population services, and 7 management commands.
Model Hierarchy
TemporalGeographicFeature (abstract — no geometry)
├── TemporalBoundary (abstract — MultiPolygon)
│ ├── CensusTIGERBoundary (abstract — GEOID + TIGER metadata)
│ │ ├── State, County, Tract, BlockGroup, Block, Place, ZCTA
│ │ ├── CongressionalDistrict, CBSA, UrbanArea
│ │ ├── StateLegislativeUpper, StateLegislativeLower, VTD, Precinct
│ │ └── SchoolDistrictElementary, Secondary, Unified
│ ├── GADMBoundary → GADMCountry, GADMAdmin1-5
│ ├── NLRBRegion, FederalJudicialDistrict
│ ├── NCESLocaleBoundary, TimezoneGeometry
│ └── Intersections (County×CD, VTD×CD, Tract×CD)
├── TemporalLinearFeature (abstract — MultiLineString)
└── TemporalPointFeature (abstract — Point)
└── SchoolLocation
Spatial Queries
from django.contrib.gis.geos import Point
from siege_utilities.geo.django.models import Tract, County, State
# Find tract containing a point
point = Point(-122.4194, 37.7749, srid=4326)
tract = Tract.objects.containing_point(point).for_year(2020).first()
# Nearest boundaries within distance (meters)
nearby = County.objects.nearest(point, max_distance_m=50_000)
# Temporal + spatial filtering
counties_2020 = County.objects.for_state("06").for_year(2020)
Management Commands
# Census TIGER/Line boundaries
python manage.py populate_boundaries --year 2020 --type county --state CA
# Demographics from ACS
python manage.py populate_demographics --year 2020 --dataset acs5 --variables B19013_001
# PL 94-171 redistricting data
python manage.py populate_pl_demographics --year 2020 --state CA
# Boundary crosswalks (2010 → 2020)
python manage.py populate_crosswalks --source-year 2010 --target-year 2020
# NCES school district + locale data
python manage.py populate_nces --year 2020
# NLRB region boundaries
python manage.py populate_nlrb_regions --year 2024
# Timezone boundaries (from timezone-boundary-builder)
python manage.py populate_timezones --file timezones.geojson --year 2024
Services
| Service | Purpose |
|---|---|
BoundaryPopulationService |
Load TIGER/Line shapefiles into boundary models |
DemographicPopulationService |
Fetch ACS/Decennial data into DemographicSnapshot |
CrosswalkPopulationService |
Build boundary change crosswalks between vintages |
TimeseriesService |
Auto-populate DemographicTimeSeries from snapshots |
DemographicRollupService |
Aggregate child geographies to parents (GEOID or crosswalk) |
UrbanicityClassificationService |
Classify tracts by NCES urbanicity codes |
NCESPopulationService |
Load school districts, locales, and school locations |
NLRBPopulationService |
Populate NLRB region boundaries |
TimezonePopulationService |
Load IANA timezone geometries from GeoJSON |
Demographics & Rollups
from siege_utilities.geo.django.models import DemographicSnapshot, DemographicTimeSeries
from siege_utilities.geo.django.services import DemographicRollupService
# Query demographics
snapshots = DemographicSnapshot.objects.filter(
content_type__model='tract',
dataset='acs5',
year=2020,
)
# Roll up tract data to county level
svc = DemographicRollupService()
results = svc.rollup(
source_level='tract',
target_level='county',
year=2020,
variables=['B19013_001', 'B01003_001'],
state_fips='06',
min_coverage=0.8, # warn if <80% of child geographies have data
)
# Crosswalk-aware rollup (handles boundary changes)
results = svc.rollup(
source_level='tract',
target_level='county',
year=2020,
variables=['B01003_001'],
crosswalk_year=2010, # map 2010 tracts to 2020 counties via crosswalk
)
Census Data Intelligence
Consolidated Census metadata registry with intelligent dataset selection.
from siege_utilities.config.census_registry import (
SurveyType, GeographyLevel, resolve_geographic_level,
VARIABLE_GROUPS, CANONICAL_GEOGRAPHIC_LEVELS,
)
from siege_utilities.geo import quick_census_selection
# Resolve geography aliases
level = GeographyLevel("congressional_district") # resolves alias → "cd"
# Quick selection for analysis
result = quick_census_selection("business", "county")
print(f"Use {result['recommendations']['primary_recommendation']['dataset']}")
# Census API with caching
from siege_utilities.geo import CensusAPIClient
client = CensusAPIClient(cache_backend='django') # or 'sqlite', 'memory'
data = client.get_acs5(
year=2020,
variables=['B19013_001'],
geography='tract',
state='06',
)
Census API Client
Direct access to Census Bureau data with built-in caching and rate limiting.
from siege_utilities.geo import CensusAPIClient
client = CensusAPIClient(api_key="your-key")
# ACS 5-Year estimates
median_income = client.get_acs5(
year=2020,
variables=['B19013_001', 'B01003_001'],
geography='county',
state='06',
)
# PL 94-171 redistricting data
from siege_utilities.geo.census_files.pl_downloader import PLFileDownloader
downloader = PLFileDownloader()
pl_data = downloader.download_state("CA", year=2020)
Hydra + Pydantic Configuration
from siege_utilities.config import HydraConfigManager
with HydraConfigManager() as manager:
user_profile = manager.load_user_profile()
branding = manager.load_branding_config("client_a")
db_connections = manager.load_database_connections("client_a")
Reporting & Visualization
from siege_utilities.reporting import ReportGenerator
report_gen = ReportGenerator(client_name="Demo Company")
report_content = {
"metadata": {"title": "Analytics Summary"},
"sections": [{"type": "text", "title": "Overview", "content": "Report summary."}],
}
report_gen.generate_pdf_report(report_content, output_path="report.pdf")
Capabilities: 7+ map types (choropleth, marker, 3D, heatmap, cluster, flow), PDF reports with TOC, PowerPoint generation, GA geographic analysis with Census demographic joins.
Function Categories
| Category | Count | Description | Dependencies |
|---|---|---|---|
| Core | 16 | Logging, strings, basic utils | None |
| Config | 54 | Database, project, client setup | None |
| Files | 21 | File ops, paths, remote downloads | None |
| Distributed | 37 | Spark utilities, HDFS operations | PySpark |
| Geo | 65+ | Census data, boundaries, spatial, GeoDjango | pandas, geopandas |
| Analytics | 45+ | Google Analytics, Workspace (Sheets/Docs/Slides), Snowflake | pandas, google-api-python-client |
| Reporting | 30+ | Charts, maps, GA reports, PDF generation | matplotlib, reportlab |
| Testing | 15 | Environment setup, test runners | None |
| Git | 9 | Branch ops, commit management | None |
| Development | 9 | Architecture analysis, code hygiene | None |
| Hygiene | 5 | Docstring generation, analysis | None |
| Data | 3 | Sample data utilities | pandas |
Installation Options
# Core only (pyyaml, requests, tqdm, pydantic)
pip install siege-utilities
# Add extras for what you need
pip install siege-utilities[geo-lite] # shapely, pyproj, geopy (no GDAL needed)
pip install siege-utilities[geo] # geo-lite + geopandas, fiona, rtree, tobler (needs GDAL)
pip install siege-utilities[geodjango] # geo + Django, DRF, PostGIS
pip install siege-utilities[data] # pandas, numpy, openpyxl, faker
pip install siege-utilities[reporting] # matplotlib, seaborn, folium, plotly, reportlab
pip install siege-utilities[analytics] # GA4, Facebook, Snowflake, scipy, scikit-learn
pip install siege-utilities[distributed] # PySpark, Apache Sedona
pip install siege-utilities[config-extras] # Hydra, hydra-zen, omegaconf
pip install siege-utilities[web] # BeautifulSoup, lxml
pip install siege-utilities[database] # SQLAlchemy, psycopg2
pip install siege-utilities[all] # Everything
# Combine extras
pip install siege-utilities[data,geo,reporting]
# Development
git clone https://github.com/siege-analytics/siege_utilities.git
cd siege_utilities
pip install -e ".[all,dev]"
Testing
1884 tests across all modules.
# Full suite
python -m pytest tests/ -v
# By marker
python -m pytest tests/ -m core
python -m pytest tests/ -m geo
python -m pytest tests/ -m "not requires_gdal"
# Quick smoke test
python -m pytest tests/ --tb=short -q
Architecture
siege_utilities/
├── config/ # Census registry, Hydra/Pydantic configs, client management
│ ├── census_registry.py # Single source of truth for Census metadata
│ └── ...
├── geo/ # Geospatial: Census API, GEOID utils, geocoding, spatial ops
│ ├── census_api_client.py
│ ├── census_files/ # PL 94-171, TIGER/Line downloaders
│ └── django/ # GeoDjango integration
│ ├── models/ # 37 concrete models (boundaries, demographics, crosswalks)
│ ├── services/ # 9 population services
│ ├── management/ # 7 management commands
│ ├── managers/ # Custom querysets (containing_point, nearest, for_year)
│ └── serializers/ # DRF GeoJSON serializers
├── distributed/ # Spark, HDFS, Databricks utilities
├── reporting/ # PDF, PowerPoint, choropleth, GA reports
├── analytics/ # GA4, Google Workspace (Sheets/Docs/Slides), Snowflake
├── files/ # File operations, hashing, remote downloads
├── core/ # Logging, string utilities
└── development/ # Architecture analysis, package management
Documentation
- Sphinx Docs: siege-analytics.github.io/siege_utilities
- Notebooks: 18 Jupyter notebooks covering all major features (in
notebooks/)
Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Run tests:
python -m pytest tests/ --tb=short -q - Commit and push
- Submit a Pull Request
License
Dual license model (effective March 6, 2026):
- AGPL-3.0-only for open-source usage
- Commercial license for proprietary/commercial usage by separate agreement
Attribution is required in both paths. See LICENSE, LICENSES/AGPL-3.0.txt, and COMMERCIAL_LICENSE.md.
Siege Utilities: Spatial Intelligence, In Python.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file siege_utilities-3.9.0.tar.gz.
File metadata
- Download URL: siege_utilities-3.9.0.tar.gz
- Upload date:
- Size: 695.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
31c550d80fbdd4af09fc00ffc0d019aa06215969f3931c9251c1aa3d7aec617e
|
|
| MD5 |
2800b22ad7e9e06e86060cec74b804e5
|
|
| BLAKE2b-256 |
a06a30583c4ad89121750976fb115873c6f61cf4feb1fe71a450d8af84cf5d84
|
File details
Details for the file siege_utilities-3.9.0-py3-none-any.whl.
File metadata
- Download URL: siege_utilities-3.9.0-py3-none-any.whl
- Upload date:
- Size: 859.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fdb572070120129f986d61b8307c93e0cb61ed61659e2bf107ea1a6df81d98a9
|
|
| MD5 |
b5f12565b829c5a57802ec23298286d7
|
|
| BLAKE2b-256 |
d90a54e5b36db6444dbd49f89919d5996c089228ab98d226ae43d99c8af7f2be
|