DigitalTwin - Dataspace is a Python package that provides a simple and efficient way to create, manage, and query data spaces.
Project description
DigitalTwin - Data Space
DigitalTwin Data Sapce is a Python package for creating, managing, and querying data spaces, with a focus on modular data pipelines and digital twin applications. It provides a flexible framework to define, schedule, and run data collectors, harvesters, and handlers, supporting complex data workflows and dependencies.
Features
- Modular Components: Define custom Collectors, Harvesters, and Handlers for your data workflows.
- Configuration-Driven: Easily configure your data pipeline using TOML files.
- Dependency Management: Automatically resolves and schedules component dependencies.
- CLI Interface: Run, schedule, and manage your data pipeline from the command line.
- Extensible: Add new data sources, processing steps, or outputs by implementing new components.
Installation
pip install digitaltwin_dataspace
Or, for development:
git clone https://github.com/GaspardMerten/digitaltwin_dataspace.git
cd digitaltwin_dataspace
pip install -e .
Dependencies:
- Python 3.8+
- requests
- SQLAlchemy
- azure-storage-blob
- schedule
- dotenv
(See pyproject.toml for the full list.)
Usage
Command Line Interface
The main entry point is the dt-dataspace CLI:
dt-dataspace --config-folder path/to/config [options]
Key options:
--config-folder: Path to the configuration folder (default:config)--init-dependencies: Run all harvesters in dependency order--handlers: List of handler names to run--collectors: List of collector names to run--harvesters: List of harvester names to run--now: Run harvesters or collectors once and exit--port: Port for the handlers server (default: 8888)--host: Host for the handlers server (default: localhost)--allowed-hosts: Allowed hosts for the handlers server--log-level: Set logging level (DEBUG,INFO, etc.)--parquetize: List of harvester names to run for parquet output
Project Structure
digitaltwin_dataspace/
│
├── components/
│ ├── collector.py # Base Collector class
│ ├── handler.py # Base Handler class
│ └── harvester.py # Base Harvester class
│
├── configuration/
│ ├── load.py # Loads and parses component configuration
│ └── model.py # Configuration data models
│
├── data/
│ ├── sync_db.py # Database sync logic
│ ├── retrieve.py # Data retrieval utilities
│ └── ... # Other data management modules
│
├── cli.py # Command-line interface
└── ...
Components
-
Collector:
Gathers data from external sources. Implement theCollectorabstract class and itsrun()method. -
Harvester:
Processes or transforms collected data. Implement theHarvesterabstract class and itsrun()method. -
Handler:
Serves or exposes processed data, e.g., via an API. Implement theHandlerabstract class and itsrun()method.
You can add your own components by subclassing these base classes and registering them in your configuration.
Configuration
Configuration is done via TOML files in your config folder (default: config/).
Each file can define multiple collectors, harvesters, and handlers, specifying:
DATA_TYPE,DATA_FORMATPATH(Python import path to your component)SCHEDULE(optional, for scheduling)SOURCE,DEPENDENCIES(for workflow chaining)- Other custom parameters
See digitaltwin_dataspace/configuration/load.py for all supported options.
Author
Gaspard Merten
gaspard@norse.be
License
Attribution-NonCommercial-ShareAlike 4.0 International
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file digitaltwin_dataspace-0.0.1.tar.gz.
File metadata
- Download URL: digitaltwin_dataspace-0.0.1.tar.gz
- Upload date:
- Size: 15.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d202ca4020a928288992e5e011790fd49012e3f15ffce6a1814c85f714eb1ca2
|
|
| MD5 |
f1509f9577a6bf97640b6b4a83e80f2b
|
|
| BLAKE2b-256 |
5c44c42e63dba4ca4fa4602ec7243afea204cd09962b9704f50af78cbcd54f45
|
Provenance
The following attestation bundles were made for digitaltwin_dataspace-0.0.1.tar.gz:
Publisher:
python-publish.yml on GaspardMerten/digitaltwin_dataspace
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
digitaltwin_dataspace-0.0.1.tar.gz -
Subject digest:
d202ca4020a928288992e5e011790fd49012e3f15ffce6a1814c85f714eb1ca2 - Sigstore transparency entry: 218170812
- Sigstore integration time:
-
Permalink:
GaspardMerten/digitaltwin_dataspace@a96c4cdd9c4bcf423a6af028b822b56454a123b4 -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/GaspardMerten
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@a96c4cdd9c4bcf423a6af028b822b56454a123b4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file digitaltwin_dataspace-0.0.1-py3-none-any.whl.
File metadata
- Download URL: digitaltwin_dataspace-0.0.1-py3-none-any.whl
- Upload date:
- Size: 19.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8ab2156650b7109cf3553faf1c1c878c825093fcbede72b4aef7ed182b862b29
|
|
| MD5 |
7851a33ccce1fb6ca44d6dfc5222f5d7
|
|
| BLAKE2b-256 |
2e0a05d7e1e3e1b324a693b6e9aef8f334131ffe2cf1d1fa85afe68e3b375d86
|
Provenance
The following attestation bundles were made for digitaltwin_dataspace-0.0.1-py3-none-any.whl:
Publisher:
python-publish.yml on GaspardMerten/digitaltwin_dataspace
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
digitaltwin_dataspace-0.0.1-py3-none-any.whl -
Subject digest:
8ab2156650b7109cf3553faf1c1c878c825093fcbede72b4aef7ed182b862b29 - Sigstore transparency entry: 218170816
- Sigstore integration time:
-
Permalink:
GaspardMerten/digitaltwin_dataspace@a96c4cdd9c4bcf423a6af028b822b56454a123b4 -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/GaspardMerten
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@a96c4cdd9c4bcf423a6af028b822b56454a123b4 -
Trigger Event:
push
-
Statement type: