Skip to main content

DDI to Knowledge Graph toolkit - transform DDI metadata into graph databases (Neo4j, RDF, Gremlin, NetworkX)

Project description

ddigraph

CI codecov PyPI License Python Neo4j Code style Type checking

A modern Python toolkit that transforms DDI (Data Documentation Initiative) XML metadata into knowledge graphs. Supports DDI Codebook and DDI-L FragmentInstance formats with streaming parsing, batched writes, and full async I/O across multiple graph backends.

Documentation | Getting Started | PyPI | Source Code


Features

  • Multi-backend support -- Neo4j, RDF/SPARQL, Gremlin, NetworkX, and pandas
  • Streaming XML processing -- Memory-bounded iterparse for files of any size
  • Batched writes -- UNWIND-based Cypher for 10-100x fewer database round trips
  • Async I/O -- Concurrent parsing and writing with back-pressure control
  • Format auto-detection -- Automatically identifies DDI Codebook vs Lifecycle format
  • Unified schema -- Single source of truth for all node and relationship definitions
  • Adapter pattern -- Plug in custom graph backends via GraphWriteAdapter protocol
  • Production-ready -- Retry logic, observability hooks, pydantic-based configuration

Quick Start

Install

pip install ddigraph

Load DDI metadata (CLI)

# Set Neo4j connection
export DDIGRAPH_NEO4J_URI=bolt://localhost:7687
export DDIGRAPH_NEO4J_USER=neo4j
export DDIGRAPH_NEO4J_PASSWORD=secret

# Bootstrap schema and load data (format is auto-detected)
ddigraph bootstrap
ddigraph load survey.xml --dataset-id my-survey

Load DDI metadata (Python)

import asyncio
from neo4j import AsyncGraphDatabase
from ddigraph import DDILoader, DDIFragmentLoader, detect_ddi_format
from ddigraph.config import Settings

async def main():
    settings = Settings()
    driver = AsyncGraphDatabase.driver(
        settings.neo4j_uri,
        auth=(settings.neo4j_user, settings.neo4j_password.get_secret_value()),
    )
    path = "survey.xml"
    if detect_ddi_format(path) == "lifecycle":
        loader = DDIFragmentLoader(driver, settings=settings)
        result = await loader.load(path)
    else:
        loader = DDILoader(driver, settings=settings)
        result = await loader.load(path, dataset_id="my-survey")
    print(result)  # {'Instrument': 1, 'Sequence': 388, 'QuestionItem': 373, ...}
    await driver.close()

asyncio.run(main())

Supported Formats

Format Description Use Case
DDI Codebook Traditional flat format with central Dataset node Survey archives, data catalogs
DDI-L FragmentInstance Lifecycle 3.x format with reusable fragments Questionnaire design, CAPI/CAWI instruments
DDI-CDI 1.0 Cross-Domain Integration metadata Data integration, statistical production

XSD Coverage

ddigraph ships with 100 % coverage of every concrete identifiable element declared in the bundled XSD schemas (schemas/). Coverage is enforced by the audit script and a pytest guardrail so new schema releases surface any gaps:

Flavor Scope Target Covered
DDI-L 3.x Concrete Maintainable + Versionable + Identifiable elements 189 100 %
DDI-C 2.x Codebook elements with the GLOBALS attribute group (no layout tags) 73 100 %
DDI-CDI 1.0 Concrete top-level entity elements (associations excluded) 210 100 %

Run python scripts/xsd_coverage.py to regenerate the audit or python scripts/xsd_coverage.py --json for machine-readable output.

Supported Backends

Backend Description Use Case
Neo4j Native graph database (Bolt) Production deployments, complex queries
RDF/SPARQL Semantic web triplestores Linked data, ontology integration
Gremlin Graph traversal language JanusGraph, Neptune, Cosmos DB
NetworkX Python graph library Local analysis, prototyping
pandas DataFrame-based Tabular analysis, Excel export

Docker Quick Start

docker run --rm --name neo4j-demo \
  -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/password \
  neo4j:5

export DDIGRAPH_NEO4J_URI=bolt://localhost:7687
export DDIGRAPH_NEO4J_USER=neo4j
export DDIGRAPH_NEO4J_PASSWORD=password

ddigraph bootstrap
ddigraph load your-file.xml --dataset-id demo

Documentation

Full documentation is available at pbisson44.github.io/ddigraph in English and French.

Development

git clone https://github.com/pbisson44/ddigraph.git
cd ddigraph
pip install -e ".[dev,docs]"

ruff check . && ruff format .
# Docstring linting is currently enforced for src/ddigraph only.
pydocstyle src/ddigraph
mypy .
pytest
mkdocs serve

License

MIT -- see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ddigraph-0.4.0.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ddigraph-0.4.0-py3-none-any.whl (1.3 MB view details)

Uploaded Python 3

File details

Details for the file ddigraph-0.4.0.tar.gz.

File metadata

  • Download URL: ddigraph-0.4.0.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for ddigraph-0.4.0.tar.gz
Algorithm Hash digest
SHA256 1f29254987bb2c65a6b854b12c72238fe1b28c391137986f7fcc6a551ba549cb
MD5 c9adc50e27343025b09ac05a4034a43e
BLAKE2b-256 f5d016d89effb3fb11990e3f4d49327f8b9ef94fe2606f9e4f9ef887e9a401fe

See more details on using hashes here.

File details

Details for the file ddigraph-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: ddigraph-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for ddigraph-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 44b0bc774f486f5d473662455b3a1afc673ad986f25db8c091df249a3dbabaa5
MD5 e02bee0631790ddbf9eacbd8113fec5b
BLAKE2b-256 430c1c6c34e10114140b3e9a6c620c0511da357315dd4b95117e558115cb8745

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page