Skip to main content

DDI to Knowledge Graph toolkit - transform DDI metadata into graph databases (Neo4j, RDF, Gremlin, NetworkX)

Project description

ddigraph

CI codecov PyPI License Python Neo4j Code style Type checking

A modern Python toolkit that transforms DDI (Data Documentation Initiative) XML metadata into knowledge graphs. Supports DDI Codebook and DDI-L FragmentInstance formats with streaming parsing, batched writes, and full async I/O across multiple graph backends.

Documentation | Getting Started | PyPI | Source Code


Features

  • Multi-backend support -- Neo4j, RDF/SPARQL, Gremlin, NetworkX, and pandas
  • Streaming XML processing -- Memory-bounded iterparse for files of any size
  • Batched writes -- UNWIND-based Cypher for 10-100x fewer database round trips
  • Async I/O -- Concurrent parsing and writing with back-pressure control
  • Format auto-detection -- Automatically identifies DDI Codebook vs Lifecycle format
  • Unified schema -- Single source of truth for all node and relationship definitions
  • Adapter pattern -- Plug in custom graph backends via GraphWriteAdapter protocol
  • Production-ready -- Retry logic, observability hooks, pydantic-based configuration

Quick Start

Install

pip install ddigraph

Load DDI metadata (CLI)

# Set Neo4j connection
export DDIGRAPH_NEO4J_URI=bolt://localhost:7687
export DDIGRAPH_NEO4J_USER=neo4j
export DDIGRAPH_NEO4J_PASSWORD=secret

# Bootstrap schema and load data (format is auto-detected)
ddigraph bootstrap
ddigraph load survey.xml --dataset-id my-survey

Load DDI metadata (Python)

import asyncio
from neo4j import AsyncGraphDatabase
from ddigraph import DDILoader, DDIFragmentLoader, detect_ddi_format
from ddigraph.config import Settings

async def main():
    settings = Settings()
    driver = AsyncGraphDatabase.driver(
        settings.neo4j_uri,
        auth=(settings.neo4j_user, settings.neo4j_password.get_secret_value()),
    )
    path = "survey.xml"
    if detect_ddi_format(path) == "lifecycle":
        loader = DDIFragmentLoader(driver, settings=settings)
        result = await loader.load(path)
    else:
        loader = DDILoader(driver, settings=settings)
        result = await loader.load(path, dataset_id="my-survey")
    print(result)  # {'Instrument': 1, 'Sequence': 388, 'QuestionItem': 373, ...}
    await driver.close()

asyncio.run(main())

Supported Formats

Format Description Use Case
DDI Codebook Traditional flat format with central Dataset node Survey archives, data catalogs
DDI-L FragmentInstance Lifecycle 3.x format with reusable fragments Questionnaire design, CAPI/CAWI instruments
DDI-CDI 1.0 Cross-Domain Integration metadata Data integration, statistical production

XSD Coverage

ddigraph ships with 100 % coverage of every concrete identifiable element declared in the bundled XSD schemas (schemas/). Coverage is enforced by the audit script and a pytest guardrail so new schema releases surface any gaps:

Flavor Scope Target Covered
DDI-L 3.x Concrete Maintainable + Versionable + Identifiable elements 189 100 %
DDI-C 2.x Codebook elements with the GLOBALS attribute group (no layout tags) 73 100 %
DDI-CDI 1.0 Concrete top-level entity elements (associations excluded) 210 100 %

Run python scripts/xsd_coverage.py to regenerate the audit or python scripts/xsd_coverage.py --json for machine-readable output.

Supported Backends

Backend Description Use Case
Neo4j Native graph database (Bolt) Production deployments, complex queries
RDF/SPARQL Semantic web triplestores Linked data, ontology integration
Gremlin Graph traversal language JanusGraph, Neptune, Cosmos DB
NetworkX Python graph library Local analysis, prototyping
pandas DataFrame-based Tabular analysis, Excel export

Docker Quick Start

docker run --rm --name neo4j-demo \
  -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/password \
  neo4j:5

export DDIGRAPH_NEO4J_URI=bolt://localhost:7687
export DDIGRAPH_NEO4J_USER=neo4j
export DDIGRAPH_NEO4J_PASSWORD=password

ddigraph bootstrap
ddigraph load your-file.xml --dataset-id demo

Documentation

Full documentation is available at pbisson44.github.io/ddigraph in English and French.

Development

git clone https://github.com/pbisson44/ddigraph.git
cd ddigraph
pip install -e ".[dev,docs]"

ruff check . && ruff format .
# Docstring linting is currently enforced for src/ddigraph only.
pydocstyle src/ddigraph
mypy .
pytest
mkdocs serve

License

MIT -- see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ddigraph-0.4.1.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ddigraph-0.4.1-py3-none-any.whl (1.3 MB view details)

Uploaded Python 3

File details

Details for the file ddigraph-0.4.1.tar.gz.

File metadata

  • Download URL: ddigraph-0.4.1.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for ddigraph-0.4.1.tar.gz
Algorithm Hash digest
SHA256 894875000a5caca4bf811e29b486bdb481d8eb3658d68d34324d556527cb2f88
MD5 6b3d946a037fa65be3e9d8ce296c243a
BLAKE2b-256 4a35857644e270a6f3de20d1861efa9cdca5c63830ae1d9d3287578fac544b8e

See more details on using hashes here.

Provenance

The following attestation bundles were made for ddigraph-0.4.1.tar.gz:

Publisher: publish.yml on pbisson44/ddigraph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ddigraph-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: ddigraph-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for ddigraph-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1281c94d48a6fc9aa25fc5861cebaf27e21ce438f73e24cabaed7410199ee36e
MD5 eb93f5bfac056b501a706f0820836903
BLAKE2b-256 a0058fbfb7783a1d4814e8a83e9dff180f3f92448496fb52ad81a31cb5318137

See more details on using hashes here.

Provenance

The following attestation bundles were made for ddigraph-0.4.1-py3-none-any.whl:

Publisher: publish.yml on pbisson44/ddigraph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page