Skip to main content

Bolster's Brain, you've been warned

Project description

Bolster

PyPI Python License GitHub Actions Code Coverage Documentation

Bolster's Brain, you've been warned 🧠

A comprehensive Python utility library for data science, web scraping, cloud services, and general development workflows. Originally designed as a personal toolkit, Bolster has evolved into a robust collection of utilities that enhance productivity across data analysis, system administration, and software development tasks.

🚀 Quick Start

Installation

pip install bolster

Basic Usage

import bolster

# Efficient data processing with built-in progress tracking
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
results = bolster.poolmap(lambda x: x**2, data)
print(results)  # {1: 1, 2: 4, 3: 9, 4: 16, ...}


# Smart retry logic with exponential backoff
@bolster.backoff(Exception, tries=3, delay=1, backoff=2)
def unreliable_api_call():
    # Your potentially failing code here
    return "Success!"


# Efficient tree/dict navigation
nested_data = {
    "users": {
        "active": [{"name": "Alice", "age": 25}, {"name": "Bob", "age": 30}],
        "inactive": [{"name": "Charlie", "age": 35}],
    }
}

# Find all ages recursively
ages = bolster.get_recursively(nested_data, "age")
print(ages)  # [25, 30, 35]

# Flatten nested structures
flat = bolster.flatten_dict(nested_data)
print(flat["users:active:0:name"])  # 'Alice'

🎯 Core Features

Concurrency & Performance

  • poolmap(): ThreadPoolExecutor wrapper with progress monitoring and robust error handling
  • exceptional_executor(): Graceful handling of failed futures in concurrent operations
  • backoff(): Exponential backoff retry decorator for unreliable operations
  • memoize(): Instance method caching with hit/miss tracking for performance optimization

Data Processing & Transformation

  • aggregate(): Pandas-like groupby operations for dictionaries and lists
  • transform_(): Flexible data transformation with key mapping and function application
  • batch() / chunks(): Efficient sequence partitioning for processing large datasets
  • Compression utilities: compress_for_relay() / decompress_from_relay() for data serialization

Tree & Dictionary Navigation

  • get_recursively(): Extract values from deeply nested structures by key
  • flatten_dict(): Convert nested dictionaries to flat key-value pairs
  • Tree analysis: breadth(), depth(), leaves(), leaf_paths() for structure inspection
  • Path navigation: keys_at(), items_at() for level-specific data access

Development & Debugging

  • arg_exception_logger(): Decorator for debugging function calls with automatic argument logging
  • MultipleErrors: Accumulate and handle multiple exceptions in complex workflows
  • working_directory(): Context manager for safe directory operations
  • pretty_print_request(): HTTP request debugging with automatic auth redaction

📊 Data Sources

Bolster includes specialized modules for working with Northern Ireland and UK data sources:

Northern Ireland Water Quality

from bolster.data_sources.ni_water import get_water_quality, get_water_quality_by_zone

# Get comprehensive water quality data for all NI supply zones
df = get_water_quality()
print(df.shape)  # Shows number of zones and parameters

# Get specific zone data
zone_data = get_water_quality_by_zone("BALM")  # Belfast Malone area
print(f"Hardness: {zone_data['NI Hardness Classification']}")

Electoral Office for Northern Ireland (EONI)

from bolster.data_sources.eoni import get_election_results

# Get Assembly election results
results_2016 = get_election_results(2016)
results_2022 = get_election_results(2022)

# Compare party performance across elections
comparison = bolster.diff(results_2022, results_2016)

Companies House Data

from bolster.data_sources.companies_house import search_companies, get_company_details

# Search for companies
results = search_companies("Technology")

# Get detailed company information
company = get_company_details("12345678")  # Company number
print(f"{company['name']} - Status: {company['status']}")

UK Met Office

from bolster.data_sources.metoffice import get_precipitation_data

# Get weather data for a specific location
weather = get_precipitation_data("Belfast", start_date="2024-01-01", end_date="2024-01-31")

Northern Ireland House Price Index

from bolster.data_sources.ni_house_price_index import (
    get_hpi_trends,
    get_sales_volumes,
    get_average_prices,
)

# Get HPI index trends over time (Q1 2005 - present)
hpi = get_hpi_trends()
print(hpi[["Period", "NI House Price Index", "Annual Change"]].tail())

# Get property sales volumes by type
sales = get_sales_volumes()
print(f"Total sales in latest quarter: {sales.iloc[-1]['Total']:,}")

# Get average sale prices
prices = get_average_prices()
print(f"Current median price: £{prices.iloc[-1]['Simple Median']:,.0f}")

NISRA Statistics

Comprehensive access to Northern Ireland Statistics and Research Agency (NISRA) data:

from bolster.data_sources.nisra import population, births, deaths, migration

# Mid-year population estimates by geography and demographics
pop_df = population.get_latest_population()
print(f"NI Population: {pop_df['population'].sum():,}")

# Monthly birth registrations
births_df = births.get_latest_births()

# Weekly death registrations with excess deaths analysis
deaths_df = deaths.get_latest_deaths()

# Migration estimates derived from demographic components
migration_df = migration.get_latest_migration()

Additional NISRA modules: labour_market, index_of_production, index_of_services, construction_output, composite_index, marriages, ashe (earnings survey), quarterly_employment_survey, emergency_care_waiting_times, stillbirths.

See NISRA module documentation for full API reference.

NISRA RSS Feed Coverage

The GOV.UK NISRA statistics RSS feed tracks new NISRA publications. Current implementation status:

Publication Module Status
Labour Market Statistics nisra.labour_market
Weekly/Monthly Deaths nisra.deaths
Monthly Births/Stillbirths nisra.births
Monthly Marriages & Civil Partnerships nisra.marriages
NI Composite Economic Index nisra.composite_index
Construction Bulletin nisra.construction_output
Index of Production nisra.index_of_production
Index of Services nisra.index_of_services
Quarterly Employment Survey nisra.quarterly_employment_survey
Emergency Care Waiting Times nisra.emergency_care_waiting_times
Monthly Stillbirths nisra.stillbirths
Population Estimates nisra.population
Migration Estimates (Derived + Official LTI) nisra.migration
Population Projections (NI-level, biennial vintage) nisra.population_projections
Population Projections — LGD sub-areas (2022-based, 2022–2047) nisra.population_projections
Annual Survey of Hours & Earnings nisra.ashe
DVA Monthly Tests Statistics dva
UK Gender Pay Gap Reporting gender_pay_gap
Individual Wellbeing nisra.wellbeing
Cancer Waiting Times nisra.cancer_waiting_times
NI Planning Activity Statistics (DfI) nisra.planning_statistics
Registrar General Quarterly Tables nisra.registrar_general
Tourism - Hotel Occupancy nisra.tourism.occupancy
Tourism - SSA Occupancy nisra.tourism.occupancy
Tourism - Visitor Statistics nisra.tourism.visitor_statistics
Security Situation Statistics - Planned
Road Traffic Collisions psni.road_traffic_collisions
PSNI Crime Statistics psni.crime_statistics ⚠️ stale (to Dec 2021)

Infrastructure NI Publication Discovery

The Infrastructure NI publications portal provides advanced filtering capabilities beyond basic publication types. Analysis of the sidebar filtering system reveals additional organizational dimensions that could enhance data source discovery:

Next Steps Analysis Directions:

  • Topic categorization: Publications span transport, environment, planning, and infrastructure domains
  • Geographic filtering: Regional breakdown capabilities for localized analysis
  • Date range analysis: Historical publication patterns and frequency tracking
  • Document format analysis: Structured data availability vs. narrative reports
  • Cross-departmental integration: Links with other NI government department publications

This systematic analysis could identify gaps in current DVA coverage and reveal additional structured datasets suitable for bolster integration.

☁️ Cloud Services

AWS Integration

from bolster.aws import get_session, S3Handler, DynamoHandler

# Get configured AWS session
session = get_session(profile="production")

# S3 operations with best practices
s3 = S3Handler(session)
s3.upload_file("local_file.txt", "bucket-name", "remote/path/file.txt")

# DynamoDB operations
dynamo = DynamoHandler(session)
items = dynamo.scan_table("user-data", filters={"status": "active"})

Azure Integration

from bolster.azure import AzureHandler

# Azure Blob Storage operations
azure = AzureHandler(connection_string="DefaultEndpointsProtocol=https;...")
azure.upload_blob("container", "blob_name", data)

🌐 Web Scraping & HTTP

from bolster.web import safe_request, parse_html_table

# Robust HTTP requests with automatic retries
response = safe_request("https://api.example.com/data", max_retries=3, timeout=30)

# Parse HTML tables into pandas DataFrames
tables = parse_html_table("https://example.com/tables")
print(tables[0].head())  # First table as DataFrame

🖥️ Command Line Interface

Bolster includes a CLI for common operations:

# Get precipitation data
bolster get-precipitation --location "Belfast" --start-date "2024-01-01"

# Get help on available commands
bolster --help

🔧 Advanced Examples

Concurrent Data Processing

import bolster
from datetime import datetime


# Process large datasets with progress tracking
def process_user_data(user_id):
    # Simulate data processing
    return {"user_id": user_id, "processed_at": datetime.now()}


user_ids = range(1000)  # 1000 users to process

# Process with automatic progress bar and error handling
results = bolster.poolmap(
    process_user_data,
    user_ids,
    max_workers=10,
    progress=True,  # Shows progress bar
)

print(f"Processed {len(results)} users successfully")

Smart Caching and Memoization

class DataProcessor:
    @bolster.memoize
    def expensive_calculation(self, data_hash):
        # Expensive operation that we want to cache
        import time

        time.sleep(2)  # Simulate expensive operation
        return f"Processed: {data_hash}"


processor = DataProcessor()

# First call - takes 2 seconds
result1 = processor.expensive_calculation("abc123")

# Second call with same input - returns immediately from cache
result2 = processor.expensive_calculation("abc123")

# Check cache performance
print(f"Cache hits: {len(processor._memoize__hits)}")
print(f"Cache misses: {len(processor._memoize__misses)}")

Robust API Integration with Backoff

import requests
import bolster


@bolster.backoff((requests.RequestException, ConnectionError), tries=5, delay=1, backoff=2)
def fetch_api_data(url):
    response = requests.get(url, timeout=10)
    response.raise_for_status()
    return response.json()


# This will automatically retry with exponential backoff on failure
data = fetch_api_data("https://api.unreliable-service.com/data")

Complex Data Transformation

# Transform API response to database format
api_response = {
    "user_name": "john_doe",
    "user_email": "john@example.com",
    "account_type": "premium",
    "signup_timestamp": "2024-01-01T12:00:00Z",
}

# Define transformation rules
rules = {
    "user_name": ("username", str.upper),  # Rename and transform
    "user_email": ("email", None),  # Keep as-is but rename
    "account_type": ("tier", lambda x: x.title()),  # Transform value
    "signup_timestamp": ("created_at", bolster.parse_iso_datetime),
}

# Apply transformation
db_record = bolster.transform_(api_response, rules)
print(db_record)
# {'username': 'JOHN_DOE', 'email': 'john@example.com',
#  'tier': 'Premium', 'created_at': datetime(2024, 1, 1, 12, 0, 0)}

🏗️ Development Setup

Prerequisites

  • Python 3.9+ (3.10, 3.11, 3.12, 3.13 supported)
  • uv (fast Python package manager)

Installation for Development

# Clone the repository
git clone https://github.com/andrewbolster/bolster.git
cd bolster

# Install with development dependencies
uv sync --all-extras --dev

# Install pre-commit hooks
uv run pre-commit install

# Run tests
uv run pytest

# Run with coverage
uv run pytest --cov=bolster --cov-report=html

# Build documentation
cd docs && uv run make html

Running Tests

# Run all tests
uv run pytest

# Run with verbose output and coverage
uv run pytest -v --cov=bolster --cov-report=term-missing

# Run specific test file
uv run pytest tests/test_core_utilities.py

# Skip network-dependent tests (useful if SSL issues)
uv run pytest -m "not network"

📚 Documentation

  • Full Documentation: https://bolster.readthedocs.io
  • API Reference: Auto-generated from docstrings
  • Examples: See /notebooks directory for Jupyter notebook examples
  • Data Sources: Detailed documentation for each data source module

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Development Guidelines

  1. Testing: Ensure all new features have comprehensive tests
  2. Documentation: Add docstrings and update README for new features
  3. Code Style: Follow the existing code style (enforced by ruff)
  4. Type Hints: Include type annotations for all public functions
  5. Performance: Consider performance implications for data processing functions

📄 License

This project is licensed under the GNU General Public License v3 (GPLv3) - see the LICENSE file for details.

🐛 Bug Reports

If you encounter any bugs or issues, please file a bug report at: https://github.com/andrewbolster/bolster/issues

🔗 Links


Built with ❤️ for data science, automation, and general productivity enhancement.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bolster-0.6.0.tar.gz (241.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bolster-0.6.0-py3-none-any.whl (287.3 kB view details)

Uploaded Python 3

File details

Details for the file bolster-0.6.0.tar.gz.

File metadata

  • Download URL: bolster-0.6.0.tar.gz
  • Upload date:
  • Size: 241.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bolster-0.6.0.tar.gz
Algorithm Hash digest
SHA256 e319f3b9f89c8cb47baf7da7e0e861bdcc3461887ec7037f94728849d1039bb1
MD5 4601571bb2964e23933274986496e7cd
BLAKE2b-256 3bd3085d009914cbcf9050547ebfd33d1f36ca0db0c681cadc7879629bffecfb

See more details on using hashes here.

Provenance

The following attestation bundles were made for bolster-0.6.0.tar.gz:

Publisher: publish.yml on andrewbolster/bolster

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bolster-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: bolster-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 287.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bolster-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 861de15b6d85ab3332fb5c8f3952ff348e4ae3d9cc893c22baf15bba4ce33d6a
MD5 1806982a2759f198802a6c5b35734bdb
BLAKE2b-256 039b0d8cb367bf6859f746cd917ad78cd1d1c2930568e2484e6173c27ab80a56

See more details on using hashes here.

Provenance

The following attestation bundles were made for bolster-0.6.0-py3-none-any.whl:

Publisher: publish.yml on andrewbolster/bolster

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page