Skip to main content

Automated registration package for urban sensor endpoints into metadata catalogs

Project description

Wrench 🔧

A powerful framework for building automated sensor registration pipelines.

Python 3.12+ Code style: ruff License: MIT

Overview

Wrench is a modular, extensible workflow framework designed to streamline the process of harvesting, enriching, and registering sensor metadata from diverse IoT sources into urban data catalogs. It provides a standardized pipeline architecture with interchangeable components to help make sensor data more discoverable and valuable.

Features

  • 🔄 Automated Metadata Harvesting: Extract metadata from various IoT data sources with minimal configuration
  • 📊 Standardized Data Models: Type-safe data structures using Pydantic for consistent handling of metadata
  • 🔍 Advanced Classification: Group similar sensors using machine learning and taxonomy-based approaches
  • Metadata Enrichment: Enhance sensor descriptions with contextual information using LLM technologies
  • 🏗️ Modular Architecture: Compose workflows from interchangeable components for maximum flexibility
  • 🔌 Extensible Interfaces: Easily add support for new data sources and catalog systems
  • 🤖 LLM Integration: Leverage AI capabilities for automatic content generation and classification

Installation

pip install auto-wrench

To install with specific component dependencies:

# For TELEClass grouper
pip install 'auto-wrench[teleclass]'

# For SensorThings support
pip install 'auto-wrench[sensorthings]'

# For KINETIC grouper
pip install 'auto-wrench[kinetic]'

# Multiple components
pip install 'auto-wrench[teleclass,sensorthings,kinetic]'

Core Components

Wrench consists of four main component types that can be combined in a pipeline:

  1. Harvesters: Extract metadata from IoT data sources (e.g., SensorThings API)
  2. Groupers: Classify and organize sensors into meaningful groups using various ML approaches
  3. MetadataEnrichers: Build spatial and temporal metadata for services and sensor groups
  4. Catalogers: Register the processed metadata into data catalogs (e.g., SDDI/CKAN)

Each component type follows a standardized interface, making it easy to extend with custom implementations.

Quick Start

The following example sets up a complete pipeline with a SensorThings API harvester, a TELEClass grouper for classification, and an SDDI cataloger for registration:

from wrench.cataloger.sddi import SDDICataloger
from wrench.grouper.teleclass import TELEClassGrouper
from wrench.harvester.sensorthings import SensorThingsHarvester
from wrench.metadataenricher.sensorthings import SensorThingsMetadataEnricher
from wrench.pipeline.sensor_pipeline import SensorRegistrationPipeline
from wrench.utils.config import LLMConfig

# Initialize components with their configurations
harvester = SensorThingsHarvester(
    base_url="https://example.org/v1.1",
    pagination_config={"page_delay": 0.2, "timeout": 60, "batch_size": 100}
)
grouper = TELEClassGrouper(config="config/teleclass_config.yaml")
metadata_enricher = SensorThingsMetadataEnricher(
    base_url="https://example.org/v1.1",
    title="City Sensor Network",
    description="Environmental sensors across the city",
    llm_config=LLMConfig(provider="openai", model="gpt-4")
)
cataloger = SDDICataloger(
    base_url="https://catalog.example.org",
    api_key="your-api-key",
    owner_org="your-organization"
)

# Assemble and run the pipeline
pipeline = SensorRegistrationPipeline(
    harvester=harvester,
    grouper=grouper,
    metadataenricher=metadata_enricher,
    cataloger=cataloger
)

result = await pipeline.run_async()

Configuration

Each component can be configured via YAML files. Here's a basic example for the SensorThings harvester:

# sta_config.yaml
base_url: "https://example.org/v1.1"
identifier: "city_sensors"
title: "City Sensor Network"
description: "Environmental sensors across the city"

pagination:
  page_delay: 0.2
  timeout: 60
  batch_size: 100

Component Overview

Harvesters

Harvesters connect to data sources and extract metadata. Wrench includes:

  • SensorThingsHarvester: Connects to OGC SensorThings API endpoints
  • Extensible base class for creating custom harvesters

Groupers

Groupers organize sensors into logical groups using various machine learning approaches:

  • TELEClassGrouper: Taxonomy-enhanced classification using LLMs and corpus-based methods
  • KINETIC: Keyword-Informed, Network-Enhanced Topical Intelligence Classifier with hierarchical clustering
  • LDAGrouper: Latent Dirichlet Allocation for topic modeling and device grouping
  • BERTopicGrouper: BERTopic-based clustering with HDBSCAN and UMAP for topic discovery
  • Can be extended with custom grouping algorithms

MetadataEnrichers

MetadataEnrichers build spatial and temporal metadata for items and groups:

  • SensorThingsMetadataEnricher: Builds metadata for SensorThings API data sources
  • Extensible base class for different data source types

Catalogers

Catalogers register metadata into data catalogs:

  • SDDICataloger: Registers metadata into SDDI/CKAN-based catalogs
  • Extensible interface for supporting other catalog systems

Advanced Features

Advanced Grouping with ML

Different groupers offer various approaches for sensor classification:

from wrench.utils.config import LLMConfig

# TELEClass with taxonomy-enhanced learning
from wrench.grouper.teleclass import TELEClassGrouper
grouper = TELEClassGrouper(config="config/teleclass_config.yaml")

# KINETIC for hierarchical topic clustering
from wrench.grouper.kinetic import KINETIC
grouper = KINETIC(
    llm_config=LLMConfig(provider="openai", model="gpt-4"),
    embedder="intfloat/multilingual-e5-large-instruct",
    lang="en",
    resolution=1
)

# LDA for topic modeling
from wrench.grouper.lda import LDAGrouper
grouper = LDAGrouper(config="config/lda_config.yaml")

# BERTopic for advanced clustering
from wrench.grouper.bertopic import BERTopicGrouper
grouper = BERTopicGrouper(config="config/bertopic_config.yaml")

Development

Setting up the Development Environment

# Clone the repository
git clone https://github.com/yourusername/wrench.git
cd wrench

# Run the make target for full setup
make setup

# Install component dependencies for development
uv pip install -e ".[teleclass,sensorthings,kinetic]"

Code Style and Testing

This project follows the Ruff code style and uses comprehensive testing:

# Format and lint code
make format
make lint

# Run tests with coverage
make test

# Run specific test types
make test_unit
make test_e2e

# Type checking
make lint_types

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Please ensure your code follows our coding standards and includes appropriate tests.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support and Documentation

For support, please:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auto_wrench-0.3.0.tar.gz (118.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

auto_wrench-0.3.0-py3-none-any.whl (132.3 kB view details)

Uploaded Python 3

File details

Details for the file auto_wrench-0.3.0.tar.gz.

File metadata

  • Download URL: auto_wrench-0.3.0.tar.gz
  • Upload date:
  • Size: 118.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.17

File hashes

Hashes for auto_wrench-0.3.0.tar.gz
Algorithm Hash digest
SHA256 b69b6708328dd6266ce94710237fdfa396c3e3eb8d588550d6c3c9d609b4b305
MD5 c4bd9fd5c1c110b1e5675804bf0f6d62
BLAKE2b-256 cbc46f91af9fd77095d2aefd9220ba0e5dc16c00aa646acb9f86a5e1873e59eb

See more details on using hashes here.

File details

Details for the file auto_wrench-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for auto_wrench-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a3176186696471dbb4ca70cb0f6aa192f5f3af2c983a3fecd137afc4399f82a9
MD5 55406f5f9da9c30f0cb7ef7b3e20b942
BLAKE2b-256 24504e5ded2665074e3b179784f5441c00dbbbad5416ecb1b56f45e57177c4fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page