Intake plugin for local artifacts in IDS connectors.
Project description
intake-ids
This is an intake plugin that provides drivers and a catalog for local artifacts in an International Data Spaces connector. It automates contract negotiation, deprecation and re-negotiation processes needed for data access from connectors.
A catalog provides a list of processable Resources in an IDS Connector. A Resource is only included (processable) in the catalog if it has at least one Representation with a supported mimetype. These are currently:
- text/csv
Future formats could include Parquet and JSON.
Installation
intake-ids
is published on PyPI.
You can install it by running the following in your terminal:
pip install intake-ids
You can test the functionality by opening the example notebook in the examples/
directory.
Usage
The package can be imported using
from intake_ids import ConnectorCatalog
Loading a catalog
You can load from a remote IDS Connector provider
by providing the URLs for both the local connector and the remote connector provider
and the authentication tuple for the local connector:
provider_url = "https://provider:8080"
consumer_url = "https://consumer:8080"
catalog = ConnectorCatalog(provider_url=provider_url, consumer_url=consumer_url, name="testcat", auth=("admin", "password"))
len(list(catalog))
By default, ConnectorCatalog will combine all "IDS Catalog"s in the connector into one catalog. You can select the "IDS Catalog" using catalog_id
.
catalog_id = "https://provider:8080/api/catalogs/eda0cda2-10f2-4b39-b462-5d4f2b1bb758"
catalog = ConnectorCatalog(provider_url=provider_url, consumer_url=consumer_url, catalog_id=catalog_id, name="testcat", auth=("admin", "password"))
You can display the resources (items) in the catalog
for entry_id, entry in catalog.items():
display(entry)
If the catalog has too many entries to comfortably print all at once, you can narrow it by searching for a term (e.g. 'motion'):
for entry_id, entry in catalog.search('motion').items():
display(entry)
Loading an artifact
Once you have identified a resource/representation you want to use, you can load it into a dataframe using read()
or read_chunked()
:
df = pd.concat(entry.read_chunked())
or
df = entry.read()
This will automatically load that dataset into the specified container for the driver for the entry.
Command line tools
The plugin provides the intake-ids-periodic-cleanup
script for periodic validation and cleanup of the cache. You can use the following crontab to run this script every 5 minutes.
*/5 * * * * $HOME/.local/bin/intake-ids-periodic-cleanup
Details
Processable (Resource, Representation)-pair entries are then included in the catalog and matched to an available driver specialized for it's type. Alongside the entries themselves are also metadata from the Representation and cursory information about the usage policy (Rules) and access rights (ContractOffer) of the Resource.
Drivers and Agreement caching
Drivers for entries allow for reading Artifacts by sorting through all the ContractOffers available for the Resource and negotiating an Agreement using one of them. If no valid ContractOffer exists for the Resource, an error is thrown. If otherwise multiple valid ContractOffers exist for the Resource, a preferable (artifact-cacheable) offer is selected. ContractAgreements are cached on the system and reused the next time the Resource is read/requested by the user if they are still valid without negotiating a new Agreement. If the cached Agreement is not valid (eg. expired) at any point, it (and all other associated items, including it's artifacts, see the next section) is removed from the cache and the process for agreement negotiation takes place again.
Currently following drivers exist:
driver | mimetypes | container |
---|---|---|
ConnectorCSV | text/csv | pandas.dataframe |
Artifact caching
Depending on the determined usage control pattern (found out by inspecting Rules in the Agreement) for the Resource they are part of, some Artifacts can be cached by the driver in the local filesystem and used directly the next time they are requested. Before each read from the cache, the driver checks the continued validity of the Agreement and evalutes the usage control restrictions, clearing the agreement/artifact from the cache if the results come negative.
Cache support for usage patterns
Following the specifications of the IDSA Position Paper about Usage Control, the IDS defines 21 policy classes. The IDS Dataspace Connector currently implements 9 of these. Of the remaining 9 usage patterns, intake-ids offers artifact- and agreement-caching (full caching) support for 4 patterns, remaining 5 are agreement-cached only.
No. | Title | artifact caching | agreement caching | support by IDS Dataspace Connector | description |
---|---|---|---|---|---|
1 | Allow the Usage of the Data | x | x | x | provides data usage without any restrictions |
2 | Connector-restricted Data Usage | x | x | allows data usage for a specific connector | |
3 | Application-restricted Data Usage | ||||
4 | Interval-restricted Data Usage | x | x | x | provides data usage within a specified time interval |
5 | Duration-restricted Data Usage | x | x | x | allows data usage for a specified time period |
6 | Location Restricted Policy | ||||
7 | Perpetual Data Sale (Payment once) | ||||
8 | Data Rental (Payment frequently) | ||||
9 | Role-restricted Data Usage | ||||
10 | Purpose-restricted Data Usage Policy | ||||
11 | Event-restricted Usage Policy | ||||
12 | Restricted Number of Usages | x | x | allows data usage for n times | |
13 | Security Level Restricted Policy | x | x | allows data access only for connectors with a specified security level | |
14 | Use Data and Delete it After | x | x | x | allows data usage within a specified time interval with the restriction to delete it at a specified time stamp |
15 | Modify Data (in Transit) | ||||
16 | Modify Data (in Rest) | ||||
17 | Local Logging | x | x | allows data usage and sends logs to a specified Clearing House | |
18 | Remote Notifications | x | x | allows data usage and sends notification message | |
19 | Attach Policy when Distribute to a Third-party | ||||
20 | Distribute only if Encrypted | ||||
21 | State Restricted Policy |
Periodic policy validation
The plugin includes a console script intake-ids-periodic-cleanup
that evaluates all stored Agreements and usage control restrictions and removes invalid ones from the cache. It is provided as a command line tool and installed to your $PATH
if the plugin was downloaded via pip/installed with setuptools. You can use this tool in a cronjob to make sure that usage policies are upheld even when the plugin is not in use.
Requirements
install_requires =
intake>=0.6.5
pandas>=1.4.3
requests>=2.28.1
pydantic>=1.10.1
isodate>=0.6.1
appdirs>=1.4.4
python_requires = >=3.10
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file intake-ids-0.0.5.tar.gz
.
File metadata
- Download URL: intake-ids-0.0.5.tar.gz
- Upload date:
- Size: 18.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c7c830af4fcae09a54d4e704581033212ba66849130d68b2da5f930ab4e798e |
|
MD5 | 3e18759600f94f84cf48d8f09dcf8205 |
|
BLAKE2b-256 | 1e7639a79ed6f1a1b198cb3d2ab8e54d5426faa97983232b9e81a14dd877427e |
File details
Details for the file intake_ids-0.0.5-py2.py3-none-any.whl
.
File metadata
- Download URL: intake_ids-0.0.5-py2.py3-none-any.whl
- Upload date:
- Size: 22.5 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 03557fee2382dab721b406c71885a11de3a1d4802e6e7178d7edbaacd4c73a58 |
|
MD5 | 3e3af32c8bad1bbd038c15691e5fee6d |
|
BLAKE2b-256 | 56cb91edc71542ca0b60937817038ad545db3e636f4cbff0ba46a36e978bc02a |