Skip to main content

EndoReg Db Django App

Project description

EndoregDB - Professional Data Infrastructure for Clinical Research

EndoregDB is a comprehensive database framework designed to manage medical and research-related data for clinical trials. This repository focuses on efficient data processing, automated deployment, security, and reproducibility, offering a flexible setup for local development environments as well as distributed systems. It supports the integration of AI/ML tools and advanced image and report processing.

This infrastructure was originally designed for clinical research studies and is optimized for handling large data volumes, including:

  • Medical reports,
  • Patient imaging and video data,
  • Clinical product and treatment data, and more.

Ingress contract

The package supports two first-class ingest boundaries:

  • watcher: trusted local filesystem intake
  • api: authenticated remote upload intake

Both boundaries create UploadJob records and converge on the same shared ingest services. The downstream processing model is shared; only the trust boundary differs.

For shared multi-center deployments, set ENDOREG_DEPLOYMENT_ROLE=central_hub. In that role the package requires authenticated API uploads with declared center_key and refuses default-center fallback on the API path.

AI and automation consumers should use the API read surfaces for reports, videos, frames, and patient timelines rather than reading STORAGE_DIR directly. Those media endpoints are the package-level contract for center-scoped access.

The node-to-node transfer API under /api/media/hub/transfers/ is supported for central_hub deployments. In standalone and site_node deployments those endpoints return 404. /api/upload/ remains the primary hub boundary.

For the current transport-security phase, transfer deployments must:

  • use HTTPS or equivalent secure transport
  • require proxy-verified mTLS for node-authenticated transfer requests
  • keep NetworkNode.shared_secret limited to request authentication rather than payload encryption

For downstream upgrade and deployment impact, see docs/deployment_note_hub_contract.md. For the full current-state hub behavior, see docs/wiki/hub_ingest_current_state.md.

Ingest workflow

The package is designed around one shared ingest core with multiple boundary adapters:

  1. watcher, api, or optional transfer ingress accepts a file or transfer payload.
  2. The boundary resolves center_key scope and creates an UploadJob or TransferJob.
  3. Provenance is normalized at creation time so audit and cleanup logic do not depend on caller-specific payload shapes.
  4. Shared processing services import, anonymize, and link the resulting media objects.
  5. Retention policy decides cleanup eligibility.

The cleanup contract is strict:

  • UploadJob.retention_policy=preserve_source: successful completion keeps the source artifact and marks cleanup as skipped
  • UploadJob.retention_policy=delete_after_success: successful completion marks the source artifact as cleanup-eligible
  • TransferJob.cleanup_policy=retain_all: no cleanup is requested
  • transfer cleanup policies other than retain_all are recorded as deferred operator intent

This keeps ingest behavior idempotent, auditable, and safe for production cleanup automation.

🚀 Key Features

System Architecture

  • Modular Design: Built on scalable and reusable components to simplify integration into various environments.
  • Multi-System Support: Manages configurations for local workstations and production servers.
  • Role-Specific Configuration: Predefined roles for common use cases:
    • Medical data processing systems
    • AI/ML model deployment
    • Research workstation configuration

Security & Data Management

  • Data Encryption: All sensitive data is encrypted, and privacy policies are enforced.
  • Impermanence: Stateless system configuration with persistence for critical data.
  • Access Control: Role-based access and identity management integration.

Data and Processing Environment

  • Data Processing: Optimized for processing medical datasets with preprocessing tools.
  • AI/ML Support:
    • Integration of machine learning tools for predictive analysis.
    • TensorFlow, PyTorch, and other frameworks supported for model training.
  • Image/Video Processing: Support for analyzing patient images and clinical videos.

Development Tools & Infrastructure

  • Data Science Toolchains: Pre-configured environments for data processing, analysis, and visualization.
  • Monitoring & Logging: Setup for continuous monitoring and logging to ensure system stability and performance.

🛠 Getting Started

Prerequisites

  • A Linux-based system (Ubuntu/Debian recommended) or NixOS
  • Hardware with sufficient storage for data processing (at least 1 TB recommended)

Quick Start

  1. Clone the repository:

    git clone https://github.com/wg-lux/endoreg-db.git
    cd endoreg-db
    
  2. Set up your Python environment We need to have a devenv.nix file.
    This Nix devenv.nix configuration sets up a Python development environment for a Django-based project using uv for dependency management. It defines project directories, environment variables, runtime packages, and several development tasks and scripts.

    Some available Test Shortcuts

    • runtests: Runs all tests — uv run python runtests.py
    • runtests-dataloader: Runs dataloader tests — uv run python runtests.py 'dataloader'
    • runtests-other: Runs other miscellaneous tests — uv run python runtests.py 'other'
    • runtests-helpers: Runs helper module tests — uv run python runtests.py 'helpers'
    • runtests-administration: Runs admin module tests — uv run python runtests.py 'administration'
    • runtests-medical: Runs medical module tests — uv run python runtests.py 'medical'
  3. Then run

    direnv allow
    
  4. Run tests: Call Devenv Script to run tests

    runtests
    

    Tests Overview

    • These tests ensure the functionality of different models and scenarios.
    • After running them, you can view the results as demonstrated in the image below:

    Test Results

  5. Run

    python manage.py migrate
    
    • It applies database migrations and make tables.
    • It updates your database schema to match the current state of your Django models.
  6. To load the database data run

    python manage.py load_base_db_data
    
    

    Data Data Data Data Data Data Data

  7. Accessing the Django Shell

    • To fetch or interact with data in the terminal, run the following command to run the Django shell:
       python manage.py shell
    
    • Using the Django shell, you can:
      • Import database models
      • Fetch data from the database
      • Access related data through model relationships (e.g., foreign keys, one-to-many, many-to-many)
      • Example is shown below

    EXAMPLE # 1

    Shell

    • Explanation: This script fetches a patient by ID and prints their related examination(s) using Django ORM. It retrieves the examination name linked to the patient from the PatientExamination table.

    EXAMPLE # 2

    Shell

    • Explanation: In the Django shell, a specific ExaminationIndication named "colonoscopy_screening" was fetched, and its related FindingIntervention records were accessed using the reverse relation expected_interventions. The first intervention (colon_lesion_polypectomy_cold_snare) was then queried to confirm it is also linked to multiple indications, demonstrating a many-to-many relationship between indications and interventions.

    EXAMPLE # 3

    Shell

    • Explanation: All required labels (polyp, instrument, digital_chromo_endoscopy, etc.) are confirmed to exist. The first available video (VideoFile) was loaded, with a valid frame_dir. Using the label "polyp", 8 labeled polyp segments were found in that video, with specific start and end frame numbers.

    EXAMPLE # 4

    Image a

    Shell

    Image b - All classifications with their choices together

    Shell

    • Explanation: Using the Django shell to fetch all morphology classifications (e.g., NICE, Paris) and their related choices from the database.

📦 Database Backup and Restore

This project includes two shell scripts to export and import database data in JSON format using Django's management commands.

Setup

First, make the scripts executable:

chmod +x import_db.sh
chmod +x export_db.sh

Export (Backup) the Database

To export the current database into a JSON file:

./export_db.sh

This will create a backup file such as endoreg_db_backup.json.

List of the comands in 'export_db.sh'

  1. python manage.py dumpdata --indent 4 --output=endoreg_db_backup.json (if migrate comand generates and stores data in database table then wee nee dto exclude those tables from dumping)

  2. python manage.py shell < fix_endoreg_db_backup_json.py

Import (Restore) the Database

To load the data back into the database

./import_db.sh

List of the comands in 'import_db.sh'

  1. rm dev_db.sqlite3
  2. python manage.py migrate
  3. python manage.py shell < fix_endoreg_db_backup_json.py
  4. python manage.py loaddata endoreg_db_backup_fixed.json

📁 Repository Structure

endoreg-db/
├── endoreg_db/                # Main Django app for medical data
│   ├── data/                  # Medical knowledge base
│   ├── management/            # Data wrangling operations
│   ├── models/                # Data models
│   ├── migrations/            # Database migrations
│   └── serializers/           # Serializers for data
├── .gitignore                 # Git ignore file for unnecessary files
└── README.md                  # Project description and setup instructions

🔒 Security Features

  • Data Encryption: All sensitive patient data is encrypted.
  • Role-Based Access Control: Configurable roles for managing access to various parts of the system.
  • Logging & Auditing: Comprehensive logging system that tracks user activities and data changes.

🖥️ Supported Systems

  • Workstations: Local development or research workstations with low data processing demands.
  • Servers: Scalable server infrastructure for processing large data volumes, integrated with cloud services for scalability.

🛟 Support

For issues and questions:

  • Create an issue in the repository
  • Review the Deployment Guide for common issues

📜 License

MIT - see LICENSE


📖 Further Documentation

All extended documentation lives in the project WikiBrowse the Wiki »

Standalone Modules In This Checkout

This repository now vendors two standalone LX modules that should be used directly for report rendering and terminology bundle authoring:

lx-report-generator with Nix

From the repo root:

cd lx-report-generator
direnv allow   # optional
devenv shell
./target/release/report_pdf_renderer \
  --input examples/report_payload.json \
  --output /tmp/report_example.pdf

To wire it into endoreg_db:

export ENDOREG_REPORT_PDF_RENDERER_BIN="$PWD/target/release/report_pdf_renderer"

lx-terminology-editor with Nix

From the repo root:

cd lx-terminology-editor
direnv allow   # optional
devenv shell
python server.py

Then open:

http://localhost:4173

The editor can publish a terminology bundle locally under:

lx-terminology-editor/.published/<publish-name>/<version>/

and writes a registry file at:

lx-terminology-editor/.published/kb_registry.json

That registry can then be used as an LX_DTYPES_KB_REGISTRY source.

Optimization Documentation


Models and Migration Documentation


API Documentation


Frame-Anonymisierung


Tutorials Documentation


Keycloak


Coding Principles & Practices


Figures


Miscellaneous

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

endoreg_db-0.9.4.2.tar.gz (670.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

endoreg_db-0.9.4.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

File details

Details for the file endoreg_db-0.9.4.2.tar.gz.

File metadata

  • Download URL: endoreg_db-0.9.4.2.tar.gz
  • Upload date:
  • Size: 670.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for endoreg_db-0.9.4.2.tar.gz
Algorithm Hash digest
SHA256 36c3613965cbbd6c48b2e02bf7dcfe04865eee856377360b84093bc5373a32f6
MD5 1dc7c4caf8472fbc955be16c68598c06
BLAKE2b-256 5be5f93643936a24f22e245ed1896d1907ec84bb590f0990dc99781b9f4d2458

See more details on using hashes here.

File details

Details for the file endoreg_db-0.9.4.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for endoreg_db-0.9.4.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9774ed66b154410f796d3b4d023f66cc6b89835f0da9cd10bac3f94f8ece6f3d
MD5 fad6c335b65b0e626cb0fb89854f9c0c
BLAKE2b-256 073e5f4d199f79d719ef04c2dfadddcd37b6e50d1a679c65d0cfb245eb52769f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page