Skip to main content

Intake plugin for local artifacts in IDS connectors.

Project description

intake-ids

This is an intake plugin that provides drivers and a catalog for local artifacts in an International Data Spaces connector. It automates contract negotiation, deprecation and re-negotiation processes needed for data access from connectors.

A catalog provides a list of processable Resources in an IDS Connector. A Resource is only included (processable) in the catalog if it has at least one Representation with a supported mimetype. These are currently:

  • text/csv

Future formats could include Parquet and JSON.

Installation

intake-ids is published on PyPI. You can install it by running the following in your terminal:

pip install intake-ids

You can test the functionality by opening the example notebook in the examples/ directory.

Usage

The package can be imported using

from intake_ids import ConnectorCatalog

Loading a catalog

You can load from a remote IDS Connector provider by providing the URLs for both the local connector and the remote connector provider and the authentication tuple for the local connector:

provider_url = "https://provider:8080"
consumer_url = "https://consumer:8080"

catalog = ConnectorCatalog(provider_url=provider_url, consumer_url=consumer_url, name="testcat", auth=("admin", "password"))
len(list(catalog))

By default, ConnectorCatalog will combine all "IDS Catalog"s in the connector into one catalog. You can select the "IDS Catalog" using catalog_id.

catalog_id   = "https://provider:8080/api/catalogs/eda0cda2-10f2-4b39-b462-5d4f2b1bb758"

catalog = ConnectorCatalog(provider_url=provider_url, consumer_url=consumer_url, catalog_id=catalog_id, name="testcat", auth=("admin", "password"))

You can display the resources (items) in the catalog

for entry_id, entry in catalog.items():
    display(entry)

If the catalog has too many entries to comfortably print all at once, you can narrow it by searching for a term (e.g. 'motion'):

for entry_id, entry in catalog.search('motion').items():
  display(entry)

Loading an artifact

Once you have identified a resource/representation you want to use, you can load it into a dataframe using read() or read_chunked():

df = pd.concat(entry.read_chunked())

or

df = entry.read()

This will automatically load that dataset into the specified container for the driver for the entry.

Command line tools

The plugin provides the intake-ids-periodic-cleanup script for periodic validation and cleanup of the cache. You can use the following crontab to run this script every 5 minutes.

*/5 * * * * $HOME/.local/bin/intake-ids-periodic-cleanup

Details

Processable (Resource, Representation)-pair entries are then included in the catalog and matched to an available driver specialized for it's type. Alongside the entries themselves are also metadata from the Representation and cursory information about the usage policy (Rules) and access rights (ContractOffer) of the Resource.

Drivers and Agreement caching

Drivers for entries allow for reading Artifacts by sorting through all the ContractOffers available for the Resource and negotiating an Agreement using one of them. If no valid ContractOffer exists for the Resource, an error is thrown. If otherwise multiple valid ContractOffers exist for the Resource, a preferable (artifact-cacheable) offer is selected. ContractAgreements are cached on the system and reused the next time the Resource is read/requested by the user if they are still valid without negotiating a new Agreement. If the cached Agreement is not valid (eg. expired) at any point, it (and all other associated items, including it's artifacts, see the next section) is removed from the cache and the process for agreement negotiation takes place again.

Currently following drivers exist:

driver mimetypes container
ConnectorCSV text/csv pandas.dataframe

Artifact caching

Depending on the determined usage control pattern (found out by inspecting Rules in the Agreement) for the Resource they are part of, some Artifacts can be cached by the driver in the local filesystem and used directly the next time they are requested. Before each read from the cache, the driver checks the continued validity of the Agreement and evalutes the usage control restrictions, clearing the agreement/artifact from the cache if the results come negative.

Cache support for usage patterns

Following the specifications of the IDSA Position Paper about Usage Control, the IDS defines 21 policy classes. The IDS Dataspace Connector currently implements 9 of these. Of the remaining 9 usage patterns, intake-ids offers artifact- and agreement-caching (full caching) support for 4 patterns, remaining 5 are agreement-cached only.

No. Title artifact caching agreement caching support by IDS Dataspace Connector description
1 Allow the Usage of the Data x x x provides data usage without any restrictions
2 Connector-restricted Data Usage x x allows data usage for a specific connector
3 Application-restricted Data Usage
4 Interval-restricted Data Usage x x x provides data usage within a specified time interval
5 Duration-restricted Data Usage x x x allows data usage for a specified time period
6 Location Restricted Policy
7 Perpetual Data Sale (Payment once)
8 Data Rental (Payment frequently)
9 Role-restricted Data Usage
10 Purpose-restricted Data Usage Policy
11 Event-restricted Usage Policy
12 Restricted Number of Usages x x allows data usage for n times
13 Security Level Restricted Policy x x allows data access only for connectors with a specified security level
14 Use Data and Delete it After x x x allows data usage within a specified time interval with the restriction to delete it at a specified time stamp
15 Modify Data (in Transit)
16 Modify Data (in Rest)
17 Local Logging x x allows data usage and sends logs to a specified Clearing House
18 Remote Notifications x x allows data usage and sends notification message
19 Attach Policy when Distribute to a Third-party
20 Distribute only if Encrypted
21 State Restricted Policy

Periodic policy validation

The plugin includes a console script intake-ids-periodic-cleanup that evaluates all stored Agreements and usage control restrictions and removes invalid ones from the cache. It is provided as a command line tool and installed to your $PATH if the plugin was downloaded via pip/installed with setuptools. You can use this tool in a cronjob to make sure that usage policies are upheld even when the plugin is not in use.

Requirements

install_requires =
    intake>=0.6.5
    pandas>=1.4.3
    requests>=2.28.1
    pydantic>=1.10.1
    isodate>=0.6.1
    appdirs>=1.4.4
python_requires = >=3.10

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intake-ids-0.0.5.tar.gz (18.5 kB view details)

Uploaded Source

Built Distribution

intake_ids-0.0.5-py2.py3-none-any.whl (22.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file intake-ids-0.0.5.tar.gz.

File metadata

  • Download URL: intake-ids-0.0.5.tar.gz
  • Upload date:
  • Size: 18.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for intake-ids-0.0.5.tar.gz
Algorithm Hash digest
SHA256 8c7c830af4fcae09a54d4e704581033212ba66849130d68b2da5f930ab4e798e
MD5 3e18759600f94f84cf48d8f09dcf8205
BLAKE2b-256 1e7639a79ed6f1a1b198cb3d2ab8e54d5426faa97983232b9e81a14dd877427e

See more details on using hashes here.

File details

Details for the file intake_ids-0.0.5-py2.py3-none-any.whl.

File metadata

  • Download URL: intake_ids-0.0.5-py2.py3-none-any.whl
  • Upload date:
  • Size: 22.5 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for intake_ids-0.0.5-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 03557fee2382dab721b406c71885a11de3a1d4802e6e7178d7edbaacd4c73a58
MD5 3e3af32c8bad1bbd038c15691e5fee6d
BLAKE2b-256 56cb91edc71542ca0b60937817038ad545db3e636f4cbff0ba46a36e978bc02a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page