Skip to main content

CDISC SDTM Mapping Tool

Project description

sdtm_mapper

Sam Tomioka

Feb 2019

About:

sdtm-mapper is a package to assist creation of CDISC SDTM mapping specifications with Python. This can be used for following tasks.

  1. Generates a empty specifications for training data from a user provided SAS dataset. This empty specification will contain SAS dataset attributes. You don't need to use Proc Contents in SAS to do this!
  2. Run models to generate a mapping specifications
  3. Generates your own mapping models using your data.

The first version comes with three pre-trained models for ADVERSE EVENTS dataset from CNS clincial trials as well as SDTM IG 3.2 and CDASH IG 1.2 metadata. These are generated with different architectures discussed at several webinars and conference. These are built on feed forward NN with trainable ELMo embedding layer for 34 classes. These are trained on adverse event datasets from Medidata Rave. Training was done on 18 studies, Validation was done on 3 studies, and Test was done on 1 study.

Models Parameters Training Acc Validation Acc Test Acc*
1. Elmo+sfnn+ae+Model1.h5 271,142 0.9795 0.9800 0.9540
2. Elmo+fnn+ae+Model2.h5 664,870 0.9846 1.0000 0.9425
3. Elmo+fnn+ae+Model3.h5 594,854 0.9966 1.0000 0.9666
Table 1 - Performance of three models
* Macro accuracy account for system variables for 'drop'.

High variance models may be due to addition of CDASH metadata, and probably better to remove them.

Improvement of the task specific model are explored by Peters et.al [1]:

  1. Freeze context-independent representations from the pre-trained biLM and concatenate them and $ELMo^{task}_{k}$ and pass that into task RNN.
  2. Replacing $h_k$ with $[x_k; ELMo^{task}_{k}]$. Peters et.al [1] has shown improved performance in some tasks such as SNLI and SQuAD by including ELMo at the output of the task RNN.
  3. Add a moderate amount of dropout to ELMo.
  4. Regularize the ELMo weights by adding $\gamma||w||^2_2$ to the loss function.

These can be considered as future enhancment for other domains that may not perform well.

Here is the architecture of ELMo.

Figure 1 - biLM architecture for ELMo

Notes:

You have to have an environment to use tensorflow, keras, scikit-learn etc.

Tutorial:

  1. Tutorial on how to use SDTMMapper to generate mapping specifications
  2. Train your data using SDTMMapper on Model 1: Note that you need to supply your training data.

Comments, Issues:

For any questions, comments, suggestions, or issues, please post them here

Contributions:

Most of the work is done during the weekends or evening. Your contributions are always welcome!

If you want to contribute for adding more models for different SDTM domains, please join PhUSE ML Project Community.

References:

[1] Peters,M et al. (2018). Deep contextualized word representations

Disclaimer:

The models were build and trained on raw AE datasets from clincial trials conducted by Sunovion Pharmaceuticals. The EDC system we use is Medidata RaveX. The training data contains some e-source data. The performance may not be good for your data. You can also build your models using SDTMMapper tool and use your custom model for your datasets.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sdtm_mapper-0.3.7.tar.gz (17.8 MB view hashes)

Uploaded Source

Built Distribution

sdtm_mapper-0.3.7-py3-none-any.whl (17.8 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page