Skip to main content

i2b2 user management with PostgreSQL Row Level Security

Project description

i2b2rls

pipeline status coverage report License: MIT

This package provides a cli and python api to manage users and projects in the dockerized i2b2 clinical data warehousing platform using PostgreSQL Row Level Security (RLS) templates.

The key functionality is registering a single database schema as multiple Data Repositories (CRC) in i2b2. Each CRC connects to the same database as a different user with its own RLS policies defined. This enables the usage of a single datasource for multiple projects to circumvent the costly and complex replication of data into project specific databases, by utilising PostgreSQL's Row Level Security feature.

In the examples each project connects with a role that has RLS policies enabled, which allows access only to patients with the provider_id set in the project configuration. The required data is defined in and derived from i2b2.yaml, the central config file.

The corresponding docker-compose based project is the i2b2-stack

Getting Started

Prerequisites

Dependencies

# On Ubuntu / Debian
sudo apt install -y python3 python3-pip gcc postgresql postgresql-server-dev-all
# On Manjaro / Arch
sudo pacman -S python3 python-pip gcc postgresql postgresql-libs libffi

Install repository

# pip install from pypi
python3 -m pip install i2b2rls
# pip install from gitlab repository
python -m pip install git+https://gitlab.com/mds-imbi-freiburg/i2b2/i2b2rls.git

Usage

Create a folder and initiate the i2b2 project dir:

mkdir i2b2-demo && cd i2b2-demo/
i2b2rls --conf=. --init

Create a new project in i2b2.yaml:

subdepartment1:  # Name of project
  crc: # Register a new CRC
    ds-user: dep1  # Create role in database & datasource
    ds-pass: demouser # Password for databse role
    provider-filter: LCS-I2B2:D000109075%  # Filter for observation_facts provider
    members:
      user1:

Register the project:

i2b2rls project register subdepartment1

This creates all required resources for an i2b2 project:

  • i2b2 Project subdepartment1
  • i2b2 Frontend User user1
  • i2b2 Project Roles for user1 (based on default-project-roles in i2b2.yaml)
  • Postgres Database User dep1 with password demouser (based on example sql files)
  • Wildfly Datasource using dep1 user to connect to the Database (based on template datasource)

NOTE: You may need to restart the stack for all settings to apply.

Manual steps

To manually run the steps for a project one can use the API from the command line:

# Run SQL scripts in user folder
i2b2rls project run_rls subdepartment1 --script=main.sql

# Generate and deploy datasource to Wildfly container
i2b2rls project generate_ds subdepartment1 deploy=True

# Register project in PM cell
i2b2rls project add_pm_project_data subdepartment1

# Register project members and their roles
i2b2rls project register_members subdepartment1

# Add Project roles for Service user (AGG_SERVICE_ACCOUNT)
i2b2rls project add_service_roles subdepartment1

# Register Cells (crc, ont, im, work) for new Project
i2b2rls project create_db_lookups subdepartment1

Help

CLI and API are built with the awesome Fire library. It provides a help page for every possible command:

i2b2rls - --help

For example:

$ i2b2rls project register --help
> NAME
>     i2b2rls project register - Create all resources for a RLS project
>
> SYNOPSIS
>     i2b2rls project register PROJECT_ID <flags>
>
> DESCRIPTION
>     Steps:
>     * (Optional) Create SQL script from template
>     * Run SQL files in role folder
>     * Create & deploy Datasource file for CRC cell
>     * Create the project in i2b2pm.pm_project_data
>     * Create mandatory i2b2-roles for AGG_SERVICE_ACCOUNT
>     * Register project db_lookups in i2b2hive.{im,ont,work,crc}_db_lookup
>
> POSITIONAL ARGUMENTS
>     PROJECT_ID
>         Type: str
>         ID of project as in i2b2.yaml
>
> FLAGS
>     --cell_id=CELL_ID
>         Type: str
>         Default: 'crc'
>         The cell data to deploy (Default: 'crc')
>     --force=FORCE
>         Type: bool
>         Default: False
>         Delete Existing resources and recreate them
>     --ignore_errors=IGNORE_ERRORS
>         Type: bool
>         Default: False
>         Ignore errors and continue creating resources
>     --add_i2b2=ADD_I2B2
>         Type: bool
>         Default: True
>         Add 'i2b2' user to project with default roles
>     --deploy_ds=DEPLOY_DS
>         Type: bool
>         Default: True
>         Copy Datasource file to wildfly server and reload
>     --extract_ds=EXTRACT_DS
>         Type: bool
>         Default: True
>         Pull Datasource file from wildfly server to local folder

Configuration

Folder layout

The init command creates the template files for an i2b2rls instance structure:

i2b2-demo/
├── i2b2.yaml            # Central project definition and configuration
├── template-ds.xml      # Template for XML Datasource Objects
├── datasources/         # Temporary folder for wildfly datasources
├── roles/               # Each RLS user has a folder here   ├── template         # Template folder other roles inherit from      └── main.sql     # Main file, write RLS struff or reference other files   ├── example          # demouser example RLS files      └── main.sql     # Main file, includes other sql files with \ir      ├── drop.sql     # Script for dropping all sql resources      ├── test.sql     # Contains PgTAP tests
└── └── └── rls.sql      # Creates roles and RLS policies

Start by editing the file i2b2.yaml and change the basic settings for your i2b2 instance in the default section. These settings will be inherited by any further project you define and can be overwritten by setting the value explicitly.

General

log-level: DEBUG                                                  # Application logging level: INFO, WARN, ERROR, DEBUG
log-file: /tmp/i2b2rls.log                                        # Log to file - Off by default
changeby-user: i2b2rls                                            # Will appear in database entries at changeby_user
wildfly-container: i2b2-wildfly                                   # Name of wildfly docker container
wildfly-datasources: /opt/jboss/wildfly/standalone/deployments/   # Path to datasources in container

Configuration Levels

Any configuration option from i2b2.yaml has sane defaults configured in conf.py to ensure every possible configuration option has a value. This is the code-level config. The values there are recursively overwritten by settings you make in i2b2.yaml.

In i2b2.yaml is a default configuration tree for projects, the default project. Every option there is inherited by other projects, and can be overwritten by them. The hierarchy of overriding values for projects looks like this: code-level < default-project-level < project-level

For the deeper nested LDAP-settings of a user: code-level < default-project-level < project-level < user-level

Default Project

As mentioned above the default project holds configurations that apply to all projects, and can be overridden by settings them explicitly on a project.

domain: i2b2demo                            # i2b2 domain (c_domain_id in db_lookups)
owner: "@"                                  # Owner (c_owner_id in db_lookups)
default-project-roles: [USER, EDITOR, ...]  # Default roles each user in each project gets, if not overridden
crc:
  host: other-pg-host

Cell Configurations

Each project needs access to the core modules (cells) of i2b2. For that, each project's cells are registered in the corresponding db_lookup table in the i2b2hive schema.

Each Cell type (except Project Management) has its own db_lookup table:

  • Ontology: ont_db_lookup
  • Workplace: work_db_lookup
  • Identity Management (IM): im_db_lookup
  • Data Repository (CRC): crc_db_lookup
  • Project Management (PM): Tables in schema i2b2pm

As these can be hosted on various postgres servers, the connection details from the wildfly server need to be configured in the datasources.

If the Datasource already exists in wildfly, ds-user and ds-pass can be omitted. If the Datasource should be created, admin-user and admin-pass must be set, so the application can connect to the postgres instance and create the db_lookup entry.

ont:
  host: i2b2-pg  # Postgres server
  port: 5432
  database: i2b2
  schema: i2b2metadata
  pool-name: OntologyDemoDS
  jndi-name: java:/OntologyDemoDS
  ds-user: i2b2metadata
  ds-pass: demouser

Users, Roles and Members

There are several types of roles, users and groups involved, so here is an overview:

Postgres Roles

Postgres does not differentiate between users and groups, both are summarized in the concept role. This way a user can inherit privileges from any other user and group, or vice versa. The only difference is that some roles have the LOGIN flag and a password set, which technically makes them users.

In context of i2b2rls there are the config objects ds-user and admin-user, each referring to a role in the Postgres database:

  • ds-user: Role that is created in the target Data Repository (Postgres server) by the application. It enables the overlying project to connect to the CRC through a Wildfly Datasource, limited by your RLS rules.
  • admin-user: Used to connect to the Postgres server to create the Datasource user above. Must be highly privileged to create resources like roles, policies and grants.
i2b2 Users

Users of the i2b2 frontend are registered in the table pm_user_data of the Project Management cell. They are identified by their user_id (username) and may have a password, full name, email address and status (active/disabled). In i2b2.yaml users of a project are defined in the members section:

your_project:
  members:
    rls:  # Username
      password: demouser  # Login password
      fullname: RLS Demo User  # Users full name
      email: demouser@example.com  # User email address
      project-roles: [USER, DATA_OBFSC, MANAGER]

If a user already exists in the database it can simply be referred by the user_id as a key without values:

other_project:
  members:
    rls:
    demo:
    i2b2:
Project Roles

To login to a project a user needs Project Roles, defined by the key project-roles. Roles are inherited by a project's default-project-roles if a user has no explicit project-roles set. The command ì2b2rls project register_members yourproject user_name collects these settings and creates the database resources in pm_user_data and pm_project_user_roles.

User LDAP

To enable LDAP authentication for a project, make sure you have the correct ldap settings in your Postgres pg_hba.conf and the ldap object in your i2b2.yaml populated with corresponding values.

Just like any setting LDAP can be configured for all projects in the default project and overridden by your specific project. The project-level settings can then be overridden by user-level settings:

projects:
  your_project:
    ldap:  # LDAP Settings used in i2b2pm.pm_user_params
      authentication-method: LDAP  # LDAP / LDAPS
      connection-url: ldap.mycompany.com  # LDAP Server
      distinguished-name: uid=  # Field to user for login
      search-base: ou=people,dc=mycompany,dc=com  # Where to find users
      security-authentication: simple  # Auth mechanism (none/simple/sasl)
    members:
      rls:
        ldap:
          # Override project-level search-base
          search-base: ou=department,ou=people,dc=example,dc=org

To enable the LDAP login for a user on project registration set at least an empty ldap: key.

To enable LDAP login manually use:

i2b2rls project set_user_ldap rlsdemo rls

To manually register LDAP for a user while setting a certain value explicitly:

i2b2rls project set_user_ldap rlsdemo rls --data='{"search-base":""}'

SQL Templates

The example folder features a set of scripts that split the creation of Postgres Roles and its RLS policies into multiple steps:

  • main.sql: Incorporates the .sql files into a single transaction. A SAVEPOINT is created before the tests and rolled back to after the tests. If any of the tests fail, the whole transaction is rolled back.
  • drop.sql: Used to remove any (preexisting) objects in the database that should be created by the application. This should be called before the Role and the Policies are created, and can be called if the Role is to be removed.
  • init.sql: Initialisation script that ensures the extension pgtap exists and the view v_observation_facts is present and contains the required fields.
  • rls.sql: Contains the statements for creating the Role and its privileges: CREATE ROLE, GRANT, CREATE POLICY.
  • test.sql: PgTAP tests that validate the role is created with necessary privileges and the RLS policies filter correctly.

The SQL scripts are called with the psql Commandline client for postgres. This enables the usage of variables in the scripts that are passed by the application (e.g. :variable_name). By default the following variables are passed to the client and can be used in scripts:

  • project_id: Project id, the object name of a project
  • role_name: Name of postgres role defined in ds-user
  • role_pass: Password of postgres role, defined in ds-pass
  • db: Database name of datasource defined in crc > database (default i2b2)
  • cell_schema: Schema of i2b2 data defined in crc > schema (default i2b2demodata)
  • Any key: value pair in pg-vars. The keys must only contain lower dashes as seperator! Also, any of the above values can be overwritten by setting it in pg-vars!

A full call from the application to psql may look like this:

psql postgresql://i2b2:demouser@i2b2-pg:5432/i2b2 \
    -v project_id=rlsdemo \
    -v role_name=rlsuser \
    -v role_pass=rlsuserpass \
    -v db=i2b2 \
    -v cell_schema=i2b2demodata \
    -v provider_filter=LCS-I2B2:D000109061% \
    -f "/path/to/config/roles/example/main.sql"

NOTE: The view v_observation_fact is required for the RLS policies. It joins the clinic's departments table provider_dimension to the patients observations observation_fact by the provider_id.

Ressources

To sum up, following resources are managed by the tool:

  • i2b2 Project and User Management (i2b2pm)
    • Frontend users (pm_user_data)
    • Frontent user LDAP settings (pm_user_params)
    • Project roles (pm_project_user_roles)
    • Project data (pm_project_data)
  • i2b2 Cells
    • Datasource's manipulation and deployment to Wildfly
    • Cell db_lookups in i2b2hive
  • PostgreSQL resources
    • Executor for SQL templates via psql client
    • Instantiation of custom SQL templates
    • Definition of postgres database roles

Simple CLI

The cli features a set of simplified commands for management of users and groups. These commands are accessed through the subcommand i2b2rls cmd [user|group] and will return 1 on error or 0 on success. These commands are designed for use in CI processes.

Docker.

As i2b2rls is built to work with a dockerized setup it needs access to the docker daemon. For this work, the image is built on top of docker-in-docker and requires the docker socket as a volume:

docker run \
  -v /tmp/data:/data \
  -v /var/run/docker.sock:/var/run/docker.sock \
  --network i2b2-stack_i2b2-net \
  -ti i2b2rls \
  i2b2rls --init --conf=/data

Development

pip install -r requirements.dev.txt
pre-commit install
pre-commit install --hook-type commit-msg

Test

Start the i2b2 postgres backend:

docker run -p 5432:5432 i2b2/i2b2-pg:p1

Run tests:

python -m pytest --cov=.

To simulate the hostname of the postgres server on your machine for testing it in /etc/hosts:

sudo echo "127.0.0.1    i2b2-pg" >> /etc/hosts

Debugging

Enable logging in Wildfly
docker exec -it -u root i2b2-wildfly bash -c "sed -i 's/INFO/DEBUG/g' /opt/jboss/wildfly/standalone/configuration/standalone.xml"
Logging in Postgres backend
client_min_messages = debug1
log_line_prefix = '%m [%p] %q%u@%d '
log_statement = 'all'

How it works

First the application parses the i2b2.yaml file and creates a full configuration tree for every project.

Glossary

Short Long Explanation
RLS Row Level Security PostgresSQL access policies based on row values. Allows granular filtering of user access to patient/observation information
CRC Data Repository Database schema and REST API (cell) for i2b2 medical data, e.g. the tables observation_facts, patient_dimension etc.
PM Project Management Database schema (cell) for i2b2 project data: users, projects, project_roles etc.
ONT Ontology Database schema and REST API (cell) for i2b2 ontologies. Used to query patient data by diagnosis
DS Datasources XML object (e.g. crc-ds.xml) that defines a database connection for the i2b2 wildfly server

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

i2b2rls-0.0.5.tar.gz (35.0 kB view hashes)

Uploaded Source

Built Distribution

i2b2rls-0.0.5-py3-none-any.whl (33.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page