i2b2 user management with PostgreSQL Row Level Security
Project description
i2b2rls
This package provides a cli and python api to manage users and projects in the dockerized i2b2 clinical data warehousing platform using PostgreSQL Row Level Security (RLS) templates.
The key functionality is registering a single database schema as multiple Data Repositories (CRC) in i2b2. Each CRC connects to the same database as a different user with its own RLS policies defined. This enables the usage of a single datasource for multiple projects to circumvent the costly and complex replication of data into project specific databases, by utilising PostgreSQL's Row Level Security feature.
In the examples each project connects with a role that has RLS policies enabled,
which allows access only to patients with the provider_id
set in the project configuration.
The required data is defined in and derived from i2b2.yaml
, the central config file.
The corresponding docker-compose based project is the i2b2-stack
Getting Started
Prerequisites
Dependencies
# On Ubuntu / Debian
sudo apt install -y python3 python3-pip gcc postgresql postgresql-server-dev-all
# On Manjaro / Arch
sudo pacman -S python3 python-pip gcc postgresql postgresql-libs libffi
Install repository
# pip install from pypi
python3 -m pip install i2b2rls
# pip install from gitlab repository
python -m pip install git+https://gitlab.com/mds-imbi-freiburg/i2b2/i2b2rls.git
Usage
Create a folder and initiate the i2b2 project dir:
mkdir i2b2-demo && cd i2b2-demo/
i2b2rls --conf=. --init
Create a new project in i2b2.yaml
:
subdepartment1: # Name of project
crc: # Register a new CRC
ds-user: dep1 # Create role in database & datasource
ds-pass: demouser # Password for databse role
provider-filter: LCS-I2B2:D000109075% # Filter for observation_facts provider
members:
user1:
Register the project:
i2b2rls project register subdepartment1
This creates all required resources for an i2b2 project:
- i2b2 Project
subdepartment1
- i2b2 Frontend User
user1
- i2b2 Project Roles for
user1
(based on default-project-roles in i2b2.yaml) - Postgres Database User
dep1
with passworddemouser
(based on example sql files) - Wildfly Datasource using
dep1
user to connect to the Database (based on template datasource)
NOTE: You may need to restart the stack for all settings to apply.
Manual steps
To manually run the steps for a project one can use the API from the command line:
# Run SQL scripts in user folder
i2b2rls project run_rls subdepartment1 --script=main.sql
# Generate and deploy datasource to Wildfly container
i2b2rls project generate_ds subdepartment1 deploy=True
# Register project in PM cell
i2b2rls project add_pm_project_data subdepartment1
# Register project members and their roles
i2b2rls project register_members subdepartment1
# Add Project roles for Service user (AGG_SERVICE_ACCOUNT)
i2b2rls project add_service_roles subdepartment1
# Register Cells (crc, ont, im, work) for new Project
i2b2rls project create_db_lookups subdepartment1
Help
CLI and API are built with the awesome Fire library. It provides a help page for every possible command:
i2b2rls - --help
For example:
$ i2b2rls project register --help
> NAME
> i2b2rls project register - Create all resources for a RLS project
>
> SYNOPSIS
> i2b2rls project register PROJECT_ID <flags>
>
> DESCRIPTION
> Steps:
> * (Optional) Create SQL script from template
> * Run SQL files in role folder
> * Create & deploy Datasource file for CRC cell
> * Create the project in i2b2pm.pm_project_data
> * Create mandatory i2b2-roles for AGG_SERVICE_ACCOUNT
> * Register project db_lookups in i2b2hive.{im,ont,work,crc}_db_lookup
>
> POSITIONAL ARGUMENTS
> PROJECT_ID
> Type: str
> ID of project as in i2b2.yaml
>
> FLAGS
> --cell_id=CELL_ID
> Type: str
> Default: 'crc'
> The cell data to deploy (Default: 'crc')
> --force=FORCE
> Type: bool
> Default: False
> Delete Existing resources and recreate them
> --ignore_errors=IGNORE_ERRORS
> Type: bool
> Default: False
> Ignore errors and continue creating resources
> --add_i2b2=ADD_I2B2
> Type: bool
> Default: True
> Add 'i2b2' user to project with default roles
> --deploy_ds=DEPLOY_DS
> Type: bool
> Default: True
> Copy Datasource file to wildfly server and reload
> --extract_ds=EXTRACT_DS
> Type: bool
> Default: True
> Pull Datasource file from wildfly server to local folder
Configuration
Folder layout
The init command creates the template files for an i2b2rls instance structure:
i2b2-demo/
├── i2b2.yaml # Central project definition and configuration
├── template-ds.xml # Template for XML Datasource Objects
├── datasources/ # Temporary folder for wildfly datasources
├── roles/ # Each RLS user has a folder here
│ ├── template # Template folder other roles inherit from
│ │ └── main.sql # Main file, write RLS struff or reference other files
│ ├── example # demouser example RLS files
│ │ └── main.sql # Main file, includes other sql files with \ir
│ │ ├── drop.sql # Script for dropping all sql resources
│ │ ├── test.sql # Contains PgTAP tests
└── └── └── rls.sql # Creates roles and RLS policies
Start by editing the file i2b2.yaml
and change the basic settings for your i2b2 instance in the default
section.
These settings will be inherited by any further project you define and can be overwritten by setting the value explicitly.
General
log-level: DEBUG # Application logging level: INFO, WARN, ERROR, DEBUG
log-file: /tmp/i2b2rls.log # Log to file - Off by default
changeby-user: i2b2rls # Will appear in database entries at changeby_user
wildfly-container: i2b2-wildfly # Name of wildfly docker container
wildfly-datasources: /opt/jboss/wildfly/standalone/deployments/ # Path to datasources in container
Configuration Levels
Any configuration option from i2b2.yaml
has sane defaults configured in conf.py
to ensure every possible configuration option has a value. This is the code-level
config.
The values there are recursively overwritten by settings you make in i2b2.yaml
.
In i2b2.yaml
is a default configuration tree for projects, the default
project.
Every option there is inherited by other projects, and can be overwritten by them.
The hierarchy of overriding values for projects looks like this:
code-level < default-project-level < project-level
For the deeper nested LDAP-settings of a user:
code-level < default-project-level < project-level < user-level
Default Project
As mentioned above the default project holds configurations that apply to all projects, and can be overridden by settings them explicitly on a project.
domain: i2b2demo # i2b2 domain (c_domain_id in db_lookups)
owner: "@" # Owner (c_owner_id in db_lookups)
default-project-roles: [USER, EDITOR, ...] # Default roles each user in each project gets, if not overridden
crc:
host: other-pg-host
Cell Configurations
Each project needs access to the core modules (cells) of i2b2.
For that, each project's cells are registered in the corresponding db_lookup
table in the
i2b2hive
schema.
Each Cell type (except Project Management) has its own db_lookup table:
- Ontology: ont_db_lookup
- Workplace: work_db_lookup
- Identity Management (IM): im_db_lookup
- Data Repository (CRC): crc_db_lookup
- Project Management (PM): Tables in schema
i2b2pm
As these can be hosted on various postgres servers, the connection details from the wildfly server need to be configured in the datasources.
If the Datasource already exists in wildfly, ds-user
and ds-pass
can be omitted.
If the Datasource should be created, admin-user
and admin-pass
must be set,
so the application can connect to the postgres instance and create the db_lookup entry.
ont:
host: i2b2-pg # Postgres server
port: 5432
database: i2b2
schema: i2b2metadata
pool-name: OntologyDemoDS
jndi-name: java:/OntologyDemoDS
ds-user: i2b2metadata
ds-pass: demouser
Users, Roles and Members
There are several types of roles, users and groups involved, so here is an overview:
Postgres Roles
Postgres does not differentiate between users and groups, both are summarized in the concept role
.
This way a user can inherit privileges from any other user and group, or vice versa.
The only difference is that some roles have the LOGIN
flag and a password set,
which technically makes them users.
In context of i2b2rls
there are the config objects ds-user
and admin-user
,
each referring to a role
in the Postgres database:
- ds-user: Role that is created in the target Data Repository (Postgres server) by the application. It enables the overlying project to connect to the CRC through a Wildfly Datasource, limited by your RLS rules.
- admin-user: Used to connect to the Postgres server to create the Datasource user above. Must be highly privileged to create resources like roles, policies and grants.
i2b2 Users
Users of the i2b2 frontend are registered in the table pm_user_data
of the Project Management cell.
They are identified by their user_id
(username) and may have a password, full name, email address and status (active/disabled).
In i2b2.yaml
users of a project are defined in the members
section:
your_project:
members:
rls: # Username
password: demouser # Login password
fullname: RLS Demo User # Users full name
email: demouser@example.com # User email address
project-roles: [USER, DATA_OBFSC, MANAGER]
If a user already exists in the database it can simply be referred by the user_id as a key without values:
other_project:
members:
rls:
demo:
i2b2:
Project Roles
To login to a project a user needs Project Roles, defined by the key project-roles
.
Roles are inherited by a project's default-project-roles
if a user has no explicit project-roles set.
The command ì2b2rls project register_members yourproject user_name
collects these settings and creates
the database resources in pm_user_data
and pm_project_user_roles
.
User LDAP
To enable LDAP authentication for a project, make sure you have the correct ldap settings in your Postgres pg_hba.conf
and the ldap object in your i2b2.yaml
populated with corresponding values.
Just like any setting LDAP can be configured for all projects in the default project and overridden by your specific project. The project-level settings can then be overridden by user-level settings:
projects:
your_project:
ldap: # LDAP Settings used in i2b2pm.pm_user_params
authentication-method: LDAP # LDAP / LDAPS
connection-url: ldap.mycompany.com # LDAP Server
distinguished-name: uid= # Field to user for login
search-base: ou=people,dc=mycompany,dc=com # Where to find users
security-authentication: simple # Auth mechanism (none/simple/sasl)
members:
rls:
ldap:
# Override project-level search-base
search-base: ou=department,ou=people,dc=example,dc=org
To enable the LDAP login for a user on project registration set at least an empty ldap:
key.
To enable LDAP login manually use:
i2b2rls project set_user_ldap rlsdemo rls
To manually register LDAP for a user while setting a certain value explicitly:
i2b2rls project set_user_ldap rlsdemo rls --data='{"search-base":""}'
SQL Templates
The example folder features a set of scripts that split the creation of Postgres Roles and its RLS policies into multiple steps:
- main.sql: Incorporates the .sql files into a single transaction. A SAVEPOINT is created before the tests and rolled back to after the tests. If any of the tests fail, the whole transaction is rolled back.
- drop.sql: Used to remove any (preexisting) objects in the database that should be created by the application. This should be called before the Role and the Policies are created, and can be called if the Role is to be removed.
- init.sql:
Initialisation script that ensures the extension
pgtap
exists and the viewv_observation_facts
is present and contains the required fields. - rls.sql:
Contains the statements for creating the Role and its privileges:
CREATE ROLE, GRANT, CREATE POLICY
. - test.sql: PgTAP tests that validate the role is created with necessary privileges and the RLS policies filter correctly.
The SQL scripts are called with the psql
Commandline client for postgres.
This enables the usage of variables in the scripts that are passed by the application (e.g. :variable_name).
By default the following variables are passed to the client and can be used in scripts:
- project_id: Project id, the object name of a project
- role_name: Name of postgres role defined in
ds-user
- role_pass: Password of postgres role, defined in
ds-pass
- db: Database name of datasource defined in
crc > database
(default i2b2) - cell_schema: Schema of i2b2 data defined in
crc > schema
(default i2b2demodata) - Any
key: value
pair inpg-vars
. The keys must only contain lower dashes as seperator! Also, any of the above values can be overwritten by setting it inpg-vars
!
A full call from the application to psql may look like this:
psql postgresql://i2b2:demouser@i2b2-pg:5432/i2b2 \
-v project_id=rlsdemo \
-v role_name=rlsuser \
-v role_pass=rlsuserpass \
-v db=i2b2 \
-v cell_schema=i2b2demodata \
-v provider_filter=LCS-I2B2:D000109061% \
-f "/path/to/config/roles/example/main.sql"
NOTE: The view v_observation_fact
is required for the RLS policies.
It joins the clinic's departments table provider_dimension
to the patients observations observation_fact
by the provider_id
.
Ressources
To sum up, following resources are managed by the tool:
- i2b2 Project and User Management (i2b2pm)
- Frontend users (pm_user_data)
- Frontent user LDAP settings (pm_user_params)
- Project roles (pm_project_user_roles)
- Project data (pm_project_data)
- i2b2 Cells
- Datasource's manipulation and deployment to Wildfly
- Cell db_lookups in i2b2hive
- PostgreSQL resources
- Executor for SQL templates via
psql
client - Instantiation of custom SQL templates
- Definition of postgres database roles
- Executor for SQL templates via
Simple CLI
The cli features a set of simplified commands for management of users and groups.
These commands are accessed through the subcommand i2b2rls cmd [user|group]
and will return 1 on error or 0 on success.
These commands are designed for use in CI processes.
Docker.
As i2b2rls
is built to work with a dockerized setup it needs access to the docker daemon.
For this work, the image is built on top of docker-in-docker
and requires
the docker socket as a volume:
docker run \
-v /tmp/data:/data \
-v /var/run/docker.sock:/var/run/docker.sock \
--network i2b2-stack_i2b2-net \
-ti i2b2rls \
i2b2rls --init --conf=/data
Development
pip install -r requirements.dev.txt
pre-commit install
pre-commit install --hook-type commit-msg
Test
Start the i2b2 postgres backend:
docker run -p 5432:5432 i2b2/i2b2-pg:p1
Run tests:
python -m pytest --cov=.
To simulate the hostname of the postgres server on your machine for testing it in /etc/hosts
:
sudo echo "127.0.0.1 i2b2-pg" >> /etc/hosts
Debugging
Enable logging in Wildfly
docker exec -it -u root i2b2-wildfly bash -c "sed -i 's/INFO/DEBUG/g' /opt/jboss/wildfly/standalone/configuration/standalone.xml"
Logging in Postgres backend
client_min_messages = debug1
log_line_prefix = '%m [%p] %q%u@%d '
log_statement = 'all'
How it works
First the application parses the i2b2.yaml file and creates a full configuration tree for every project.
Glossary
Short | Long | Explanation |
---|---|---|
RLS | Row Level Security | PostgresSQL access policies based on row values. Allows granular filtering of user access to patient/observation information |
CRC | Data Repository | Database schema and REST API (cell) for i2b2 medical data, e.g. the tables observation_facts , patient_dimension etc. |
PM | Project Management | Database schema (cell) for i2b2 project data: users, projects, project_roles etc. |
ONT | Ontology | Database schema and REST API (cell) for i2b2 ontologies. Used to query patient data by diagnosis |
DS | Datasources | XML object (e.g. crc-ds.xml) that defines a database connection for the i2b2 wildfly server |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file i2b2rls-0.0.5.tar.gz
.
File metadata
- Download URL: i2b2rls-0.0.5.tar.gz
- Upload date:
- Size: 35.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d63c312749e050014d50838dc5e15b501614d45ab6681216eeefb5d853c3dc9 |
|
MD5 | 48b86fbaeb2a058c888a6fca7486fe4c |
|
BLAKE2b-256 | a3d18c4bee928f94c28ac951a6c27d8b732d5bffd6d82745e2385d56528a5bf7 |
File details
Details for the file i2b2rls-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: i2b2rls-0.0.5-py3-none-any.whl
- Upload date:
- Size: 33.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f26c67307b9746fafe55d4117790d587a806c050aafaf9930e21af2d649ea659 |
|
MD5 | 81b5223fb5838fb19534418e74b7e27a |
|
BLAKE2b-256 | 185d49b7917b566717be0015ce8b34acd8561e7f743b6b12c0a83e566e89c97f |