Skip to main content

DjangoLDP extension for model indexing and pattern-based search

Project description

DjangoLDP Indexing

DjangoLDP extension for model indexing and pattern-based search.

Features

  • Instance-level indexing (WebID profile, public type index)
  • Model-based indexing with support for indexed fields
  • Pattern-based search for indexed fields
  • Static index file generation

Installation

pip install djangoldp-indexing

Configuration

  1. Add to INSTALLED_APPS:
INSTALLED_APPS = [
    ...
    'djangoldp_indexing',
]

Note: The djangoldp_indexing app must be added after all apps that will contain indexed models in the INSTALLED_APPS list.

  1. Add to DJANGOLDP_PACKAGES:
DJANGOLDP_PACKAGES = [
    ...
    'djangoldp_indexing',
]

Note: The djangoldp_indexing package must be added after all packages that will contain indexed models in the DJANGOLDP_PACKAGES list.

  1. Create an indexing_config.yml file in the root of your server folder (the same directory as manage.py). The file should contain the configuration for indexed fields. Here’s an example structure:
djangoldp_tems_trial8:
    Trial8Object:
        indexed_fields:
            - title
    AnotherModel:
        indexed_fields:
            - name
            - created_at
another_package:
    MyOtherModel:
        indexed_fields:
            - field1
            - field2
  1. For models that sit directly in your application, you can safely add indexed fields directly to your models definition:
class MyModel(Model):
    class Meta:
        indexed_fields = ['title', 'description']
  1. Add the following to your settings.yml file (necessary to check the dataspace policy):
server:
    EDC_URL: 'http://localhost' # URL to the EDC connector (default: http://localhost)

Architecture

Overview

DjangoLDP Indexing implements a three-level hierarchical indexing system following the Solid Type Index specification. The package provides both dynamic views and static file generation for indexes, with integrated dataspace policy enforcement.

System Architecture

graph TD
    subgraph "Configuration Layer"
        YAML[indexing_config.yml]
        META[Model Meta.indexed_fields]
        YAML -->|apps.py ready| MODELS[Model._meta.indexed_fields]
        META --> MODELS
    end

    subgraph "View Layer - Instance Level"
        ROOT[InstanceRootContainerView<br/>/]
        WEBID[InstanceWebIDView<br/>/profile]
        PTI[PublicTypeIndexView<br/>/profile/publicTypeIndex]
        IDXROOT[InstanceIndexesRootView<br/>/indexes/]

        ROOT --> WEBID
        WEBID --> PTI
        ROOT --> IDXROOT
    end

    subgraph "View Layer - Model Level"
        MRI[ModelRootIndexView<br/>/indexes/model/index]
        MPI[ModelPropertyIndexView<br/>/indexes/model/field/index]
        MPP[ModelPropertyPatternIndexView<br/>/indexes/model/field/pattern]

        MRI -->|lists fields| MPI
        MPI -->|lists patterns| MPP
        MPP -->|returns resources| DB[(Database)]
    end

    subgraph "Static Generation"
        CMD_LOCAL[generate_local_indexes]
        CMD_FEDEX[crawl_indexes]

        CMD_LOCAL -->|simulates requests| MRI
        CMD_LOCAL -->|simulates requests| MPI
        CMD_LOCAL -->|simulates requests| MPP
        CMD_LOCAL -->|writes| STATIC_IDX[STATIC_ROOT/indexes/*.jsonld]

        CMD_FEDEX -->|crawls LDP sources| REMOTE[Remote LDP Sources]
        CMD_FEDEX -->|writes| STATIC_FDX[STATIC_ROOT/fedex/*.jsonld]
    end

    subgraph "Static Serving"
        SERVE_IDX[serve_static_index]
        SERVE_FDX[serve_static_fedex]
        SERVE_PROF[serve_static_profile]

        STATIC_IDX --> SERVE_IDX
        STATIC_FDX --> SERVE_FDX
        STATIC_FDX --> SERVE_PROF
    end

    subgraph "Policy Enforcement"
        POLICY[check_dataspace_policy]
        EDC[EDC Catalog API]
        USER[User.dataSpaceProfile]

        SERVE_IDX --> POLICY
        POLICY -->|uses API key from| USER
        POLICY -->|queries| EDC
        EDC -->|catalog contains idx:IndexEntry| POLICY
    end

    MODELS -->|provides indexed fields| PTI
    MODELS -->|provides indexed fields| MRI
    MODELS -->|queries data| MPI

    PTI -.->|references| MRI

    classDef configClass fill:#e1f5ff,stroke:#0066cc
    classDef viewClass fill:#fff4e1,stroke:#cc8800
    classDef staticClass fill:#e1ffe1,stroke:#00cc00
    classDef policyClass fill:#ffe1e1,stroke:#cc0000

    class YAML,META,MODELS configClass
    class ROOT,WEBID,PTI,IDXROOT,MRI,MPI,MPP viewClass
    class CMD_LOCAL,CMD_FEDEX,STATIC_IDX,STATIC_FDX,SERVE_IDX,SERVE_FDX,SERVE_PROF staticClass
    class POLICY,EDC,USER policyClass

Three-Level Index Hierarchy

The indexing system organizes data in three levels for efficient pattern-based search:

graph LR
    subgraph "Level 1: Model Index"
        MI["/indexes/users/index<br/>Lists: title, description"]
    end

    subgraph "Level 2: Property Index"
        PI_T["/indexes/users/title/index<br/>Lists patterns: 'ali', 'bob', 'cha'"]
        PI_D["/indexes/users/description/index<br/>Lists patterns: 'dev', 'eng'"]
    end

    subgraph "Level 3: Pattern Index"
        PP_ALI["/indexes/users/title/ali<br/>Returns: alice, alison"]
        PP_BOB["/indexes/users/title/bob<br/>Returns: bob, bobby"]
        PP_DEV["/indexes/users/description/dev<br/>Returns: developer, devops"]
    end

    MI -->|field: title| PI_T
    MI -->|field: description| PI_D
    PI_T -->|pattern: ali| PP_ALI
    PI_T -->|pattern: bob| PP_BOB
    PI_D -->|pattern: dev| PP_DEV

    classDef level1 fill:#ffcccc
    classDef level2 fill:#ccffcc
    classDef level3 fill:#ccccff

    class MI level1
    class PI_T,PI_D level2
    class PP_ALI,PP_BOB,PP_DEV level3

How it works:

  1. Level 1 - Model Index: Lists all indexed fields for a model (e.g., /indexes/users/index shows that title and description are indexed)
  2. Level 2 - Property Index: For each field, analyzes actual database data to find all unique 3-character prefixes (e.g., /indexes/users/title/index lists patterns like 'ali', 'bob', 'cha')
  3. Level 3 - Pattern Index: Returns all resources where the field value starts with the pattern (e.g., /indexes/users/title/ali returns users with names like "alice", "alison")

Key Components

Configuration: Indexed fields can be defined via YAML config (for external packages) or Model Meta class (for your own models). At startup, DjangoLDPIndexingConfig.ready() consolidates these into model._meta.indexed_fields.

Views: All views inherit from IndexBaseView and return JSON-LD formatted responses with CORS headers. Instance-level views provide WebID profiles and type indexes, while model-level views handle the three-tier index hierarchy.

Static Generation: Management commands simulate requests to the view classes and save rendered JSON-LD files to disk. This avoids database queries during production serving.

Policy Enforcement: The serve_static_index view enforces dataspace policy by verifying that the requested index URL exists in the user's EDC catalog (obtained via their dataSpace profile API key). Can be bypassed with X-Bypass-Policy: true header.

Federation: The crawl_indexes command discovers remote LDP sources with federation: indexes property and aggregates their type indexes into a federated index structure.

Authorization

Policy Enforcement

Access to index resources is protected using a two-tier authorization approach:

1. Contract-Based Authorization (Primary)

Clients can access indexes by providing a valid EDC contract agreement ID and participant ID via headers.

Required headers:

  • DSP-AGREEMENT-ID: The contract agreement identifier
  • DSP-PARTICIPANT-ID: The participant identifier

Example request:

curl -H "DSP-AGREEMENT-ID: contract-123" \
     -H "DSP-PARTICIPANT-ID: participant-456" \
     http://localhost:8000/indexes/users/index

How it works:

  1. System extracts contract ID and participant ID from request headers
  2. Verifies contract with EDC Management API v3 at {EDC_URL}/management/v3/contractagreements/{contract_id}
  3. Checks contract state is FINALIZED or VERIFIED
  4. Validates that requested resource is covered by the contract:
    • If assetId is a URL: Direct matching against requested URL
    • If assetId is an ID: Fetches asset from {EDC_URL}/management/v3/assets/{assetId} and checks dataAddress.baseUrl
    • Fallback to policy.target if assetId is empty

Benefits:

  • No user authentication required
  • Faster authorization (single API call)
  • Ideal for data sharing between organizations
  • Supports external clients with valid contracts

2. Profile-Based Authorization (Fallback)

If no contract header is provided or contract verification fails, the system falls back to checking the authenticated user's dataspace profile.

How it works:

  1. Verifies user is authenticated
  2. Fetches user's dataSpaceProfile from their profile URL
  3. Uses the profile's edc_api_key to query EDC catalog
  4. Checks if requested index URL exists in the catalog's idx:IndexEntry fields

Benefits:

  • Backward compatible with existing implementations
  • User-specific access control
  • Works with standard authentication flows

Authorization Flow Scenarios

Understanding what happens when contracts exist, don't exist, or aren't provided:

Scenario 1: Valid contract provided

curl -H "DSP-AGREEMENT-ID: 56d52ce8-5ae0-4f0b-bfce-3e6dd6124bfc" \
     -H "DSP-PARTICIPANT-ID: stbx-consumer" \
     http://localhost:8000/indexes/objects/trial6/index
  • ✅ System verifies contract with EDC
  • ✅ Validates contract state (FINALIZED/VERIFIED)
  • ✅ Resolves asset and checks resource coverage
  • Access granted (no user authentication required)

Scenario 2: Invalid/non-existent contract provided

curl -H "DSP-AGREEMENT-ID: nonexistent-contract-123" \
     -H "DSP-PARTICIPANT-ID: stbx-consumer" \
     http://localhost:8000/indexes/objects/trial6/index
  • ❌ System attempts contract verification
  • ❌ EDC returns 404 or contract is invalid
  • Access denied immediately (no fallback to profile-based auth)
  • Important: When contract headers are provided, the system assumes you want contract-based authorization exclusively

Scenario 3: No contract headers, authenticated user with access

curl -H "Cookie: sessionid=..." \
     http://localhost:8000/indexes/objects/trial6/index
  • ✅ No contract header → Falls back to profile-based authorization
  • ✅ User is authenticated
  • ✅ User has dataSpaceProfile with edc_api_key
  • ✅ System queries EDC catalog with user's API key
  • ✅ Requested index URL exists in catalog
  • Access granted

Scenario 4: No contract headers, user without access

curl http://localhost:8000/indexes/objects/trial6/index
  • ❌ No contract header → Falls back to profile-based authorization
  • ❌ User not authenticated OR no dataSpaceProfile OR index not in catalog
  • Access denied

Authorization Decision Table

Scenario Contract Header? User Auth? dataSpaceProfile? In Catalog? Result
Valid contract ✅ Valid N/A N/A N/A Granted
Invalid contract ✅ Invalid N/A N/A N/A Denied (no fallback)
No contract Granted (via profile)
No contract Denied
No contract N/A Denied (no profile)
No contract N/A N/A Denied (not authenticated)
Bypass header N/A N/A N/A N/A Granted (dev/test only)

Bypass Option

For development or testing, policy checks can be bypassed using:

curl -H "X-Bypass-Policy: true" \
     http://localhost:8000/indexes/users/index

Protected Resources

Protected (require authorization):

  • Static local indexes: /indexes/**
  • Model-level dynamic views (if enabled)

Public (no authorization required):

  • Federated indexes: /fedex/**
  • Instance-level views: /profile, /profile/publicTypeIndex

Generating the static local index files

python manage.py generate_local_indexes

Optional parameters:

  • --root_url: the base URL of the django LDP server (default: http://localhost:8000)
  • --root_location: the location to save the static index files (default: indexes) relative to the <settings.STATIC_ROOT> folder of the django project.
    • Note: At this time, the files are served from the <settings.STATIC_ROOT>/indexes folder, so changing this parameter will result in no change in the served files content.

At this stage, the static index files are not automatically generated so it needs to be done manually by running the command above. When initializing the server or when the data changes.

Generating the federated index files

python manage.py crawl_indexes

Optional parameters:

  • --root_url: the base URL of the django LDP server (default: http://localhost:8000/fedex)
  • --root_location: the location to save the static index files (default: fedex) relative to the <settings.STATIC_ROOT> folder of the django project.
    • Note: At this time, the files are served from the <settings.STATIC_ROOT>/fedex folder, so changing this parameter will result in no change in the served files content.

This command will browse the LDP sources on this server, with an indexes value for the federation property and use their <host>/profile/publicTypeIndex response to build the federated index.

At this stage :

  • The federated index files are not automatically generated so it needs to be done manually by running the command above. When initializing the server or when the data changes.
  • The crawler isn't recursive, only the Model level of indexing is federated (Property and pattern-based indexing aren't federated).

Testing the package

To run the tests after checking out the repository:

# 1. Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate

# 2. Install djangoldp with the version you want to test the package against
pip install djangoldp~=4.0.0

# 3. Install the package in editable mode
pip install -e .

# 4. Run the tests
python djangoldp_indexing/tests/runner.py

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

djangoldp_indexing-2.0.1.tar.gz (39.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

djangoldp_indexing-2.0.1-py3-none-any.whl (54.4 kB view details)

Uploaded Python 3

File details

Details for the file djangoldp_indexing-2.0.1.tar.gz.

File metadata

  • Download URL: djangoldp_indexing-2.0.1.tar.gz
  • Upload date:
  • Size: 39.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.5 requests-toolbelt/1.0.0 urllib3/2.6.2 tqdm/4.67.1 importlib-metadata/8.7.1 keyring/25.7.0 rfc3986/2.0.0 colorama/0.4.6 CPython/3.11.14

File hashes

Hashes for djangoldp_indexing-2.0.1.tar.gz
Algorithm Hash digest
SHA256 0a5e9373b1807c149865a207e7652a6b9446df23049e138d21383e23d6f8aede
MD5 2ff29a909797812dc11341573eda1f6c
BLAKE2b-256 9235b35f010f6707d11c7350afd405cf828ea2b6a45deab9309f4e48fa5aab1c

See more details on using hashes here.

File details

Details for the file djangoldp_indexing-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: djangoldp_indexing-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 54.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.5 requests-toolbelt/1.0.0 urllib3/2.6.2 tqdm/4.67.1 importlib-metadata/8.7.1 keyring/25.7.0 rfc3986/2.0.0 colorama/0.4.6 CPython/3.11.14

File hashes

Hashes for djangoldp_indexing-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4ac0c419a06a55955ffd02645260270fdce4e15fa1e9b8e9431c33515ab1e9aa
MD5 b98822ed8ed3a172f3d0e721e52e288e
BLAKE2b-256 8c9323af5eb2d5c0af1b94292a8016c0c8b545f3e654e1c9416c4e97ec2f5a04

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page