Skip to main content

Automated registration package for urban sensor endpoints into metadata catalogs

Project description

Wrench 🔧

A powerful framework for building automated sensor registration pipelines.

PyPI version Python 3.12+ CI Code style: ruff License: Apache 2.0

Overview

Wrench is a modular, extensible workflow framework designed to streamline the process of harvesting, enriching, and registering sensor metadata from diverse IoT sources into urban data catalogs. It provides a standardized pipeline architecture with interchangeable components to help make sensor data more discoverable and valuable.

Features

  • 🔄 Automated Metadata Harvesting: Extract metadata from various IoT data sources with minimal configuration
  • 📊 Standardized Data Models: Type-safe data structures using Pydantic for consistent handling of metadata
  • 🔍 Advanced Classification: Group similar sensors using machine learning and taxonomy-based approaches
  • Metadata Enrichment: Enhance sensor descriptions with contextual information using LLM technologies
  • 🏗️ Modular Architecture: Compose workflows from interchangeable components for maximum flexibility
  • 🔌 Extensible Interfaces: Easily add support for new data sources and catalog systems
  • 🤖 LLM Integration: Leverage AI capabilities for automatic content generation and classification

Installation

pip install auto-wrench

To install with specific component dependencies:

# For SensorThings support
pip install 'auto-wrench[sensorthings]'

# For KINETIC grouper
pip install 'auto-wrench[kinetic]'

# Multiple components
pip install 'auto-wrench[sensorthings,kinetic]'

Core Components

Wrench consists of four main component types that can be combined in a pipeline:

  1. Harvesters: Extract metadata from IoT data sources (e.g., SensorThings API)
  2. Groupers: Classify and organize sensors into meaningful groups using various ML approaches
  3. MetadataEnrichers: Build spatial and temporal metadata for services and sensor groups
  4. Catalogers: Register the processed metadata into data catalogs (e.g., SDDI/CKAN)

Each component type follows a standardized interface, making it easy to extend with custom implementations.

Quick start

The following example sets up a complete pipeline with a SensorThings API harvester, a KINETIC grouper for classification, and an SDDI cataloger for registration:

import asyncio

from wrench.cataloger.sddi import SDDICataloger
from wrench.grouper.kinetic import KINETIC
from wrench.harvester.sensorthings import SensorThingsHarvester
from wrench.metadataenricher.sensorthings import SensorThingsMetadataEnricher
from wrench.pipeline.sensor_pipeline import SensorRegistrationPipeline
from wrench.utils.config import LLMConfig

llm_config = LLMConfig(base_url="https://my-llm.com", model="llama3.3:70b-instruct-q4_K_M")

# Initialize components
harvester = SensorThingsHarvester(
    base_url="https://example.org/v1.1",
    pagination_config={"page_delay": 0.2, "timeout": 60, "batch_size": 100},
)
grouper = KINETIC(
    llm_config=llm_config,
    embedder="intfloat/multilingual-e5-large-instruct",
    resolution=1,
)
metadata_enricher = SensorThingsMetadataEnricher(
    base_url="https://example.org/v1.1",
    title="City Sensor Network",
    description="Environmental sensors across the city",
    llm_config=llm_config,
)
cataloger = SDDICataloger(
    base_url="https://catalog.example.org",
    api_key="your-api-key",
    owner_org="your-organization",
)

# Assemble and run the pipeline
pipeline = SensorRegistrationPipeline(
    harvester=harvester,
    grouper=grouper,
    metadataenricher=metadata_enricher,
    cataloger=cataloger,
)

result = asyncio.run(pipeline.run_async())

Configuration

Wrench supports YAML-based pipeline configuration through PipelineRunner.from_config_file(). The top-level template_ key selects the pipeline template. Environment variables are resolved using ${VAR_NAME} syntax.

# pipeline_config.yaml
template_: SensorPipeline

harvester:
  sensorthings:
    base_url: "https://example.org/v1.1"
    pagination_config:
      page_delay: 0.2
      timeout: 60
      batch_size: 100

grouper:
  kinetic:
    llm_config:
      model: ${OLLAMA_MODEL}
      base_url: ${OLLAMA_URL}
      api_key: ${OLLAMA_API_KEY}

metadataenricher:
  sensorthings:
    base_url: "https://example.org/v1.1"
    title: "City Sensor Network"
    description: "Environmental sensors across the city"
    llm_config:
      model: ${OLLAMA_MODEL}
      base_url: ${OLLAMA_URL}
      api_key: ${OLLAMA_API_KEY}

cataloger:
  noop: {}

Run a YAML-configured pipeline with:

import asyncio
from wrench.pipeline.config import PipelineRunner

runner = PipelineRunner.from_config_file("pipeline_config.yaml")
result = asyncio.run(runner.run({}))

Component Overview

Harvesters

Harvesters connect to data sources and extract metadata. Wrench includes:

  • SensorThingsHarvester: Connects to OGC SensorThings API endpoints
  • Extensible base class for creating custom harvesters

Groupers

Groupers organize sensors into logical groups using various machine learning approaches:

  • KINETIC: Keyword-Informed, Network-Enhanced Topical Intelligence Classifier with hierarchical clustering
  • LDAGrouper: Latent Dirichlet Allocation for topic modeling and device grouping
  • BERTopicGrouper: BERTopic-based clustering with HDBSCAN and UMAP for topic discovery
  • Can be extended with custom grouping algorithms

MetadataEnrichers

MetadataEnrichers build spatial and temporal metadata for items and groups:

  • SensorThingsMetadataEnricher: Builds metadata for SensorThings API data sources
  • Extensible base class for different data source types

Catalogers

Catalogers register metadata into data catalogs:

  • SDDICataloger: Registers metadata into SDDI/CKAN-based catalogs
  • Extensible interface for supporting other catalog systems

Advanced Features

Advanced grouping with ML

Different groupers offer various approaches for sensor classification:

from wrench.utils.config import LLMConfig

# KINETIC for hierarchical topic clustering
from wrench.grouper.kinetic import KINETIC
grouper = KINETIC(
    llm_config=LLMConfig(base_url="https://my-llm.com", model="llama3.3:70b-instruct-q4_K_M"),
    embedder="intfloat/multilingual-e5-large-instruct",
    lang="en",
    resolution=1,
)

# LDA for topic modeling (no extra dependencies required)
from wrench.grouper.lda import LDAGrouper
from wrench.grouper.lda.models import LDAConfig
grouper = LDAGrouper(config=LDAConfig(n_topics=10, alpha=0.1, beta=0.01))

# BERTopic for density-based clustering (requires sentence-transformers, hdbscan, umap-learn, bertopic)
from wrench.grouper.bertopic import BERTopicGrouper
from wrench.grouper.bertopic.models import BERTopicConfig
grouper = BERTopicGrouper(config=BERTopicConfig(min_topic_size=10))

Development

Setting up the Development Environment

# Clone the repository
git clone https://github.com/urbansense/wrench.git
cd wrench

# Run the make target for full setup
make setup

# Install component dependencies for development
uv pip install -e ".[sensorthings,kinetic]"

Code Style and Testing

This project follows the Ruff code style and uses comprehensive testing:

# Format and lint code
make format
make lint

# Run tests with coverage
make test

# Run specific test types
make test_unit
make test_e2e

# Type checking
make lint_types

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Please ensure your code follows our coding standards and includes appropriate tests.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Support and Documentation

For support, please:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auto_wrench-0.4.0.tar.gz (136.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

auto_wrench-0.4.0-py3-none-any.whl (147.9 kB view details)

Uploaded Python 3

File details

Details for the file auto_wrench-0.4.0.tar.gz.

File metadata

  • Download URL: auto_wrench-0.4.0.tar.gz
  • Upload date:
  • Size: 136.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for auto_wrench-0.4.0.tar.gz
Algorithm Hash digest
SHA256 ab9476e617eacaade0c93fa88b5cb93c54c27b3ad0d913c1d6965eba126722c6
MD5 2487e886e8acf4bac96c80b3b27b2b7d
BLAKE2b-256 ebca47f1c657e5b9cd8590fdb63502e956a15c598ab4965543be8be17a9e8d7a

See more details on using hashes here.

File details

Details for the file auto_wrench-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: auto_wrench-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 147.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for auto_wrench-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4b377283bd33809b45fbcad4c778e8ca960c5dcf276621e02f08db24179eca16
MD5 23819a10ea5233f44272400ec32410c8
BLAKE2b-256 9d2678da7cfad60e43ff97f699287acec6d382ff15989b2d79825550aad27f6a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page