Skip to main content

CDISC SDTM Mapping Tool

Project description


Sam Tomioka

Feb 2019


sdtm-mapper is a package to assist creation of CDISC SDTM mapping specifications with Python. This can be used for following tasks.

  1. Generates a empty specifications for training data from a user provided SAS dataset. This empty specification will contain SAS dataset attributes. You don't need to use Proc Contents in SAS to do this!
  2. Run models to generate a mapping specifications
  3. Generates your own mapping models using your data.

The first version comes with three pre-trained models for ADVERSE EVENTS dataset from CNS clincial trials as well as SDTM IG 3.2 and CDASH IG 1.2 metadata. These are generated with different architectures discussed at several webinars and conference. These are built on feed forward NN with trainable ELMo embedding layer for 34 classes. These are trained on adverse event datasets from Medidata Rave. Training was done on 18 studies, Validation was done on 3 studies, and Test was done on 1 study.

Models Parameters Training Acc Validation Acc Test Acc*
1. Elmo+sfnn+ae+Model1.h5 271,142 0.9795 0.9800 0.9540
2. Elmo+fnn+ae+Model2.h5 664,870 0.9846 1.0000 0.9425
3. Elmo+fnn+ae+Model3.h5 594,854 0.9966 1.0000 0.9666
Table 1 - Performance of three models
* Macro accuracy account for system variables for 'drop'.

High variance models may be due to addition of CDASH metadata, and probably better to remove them.

Improvement of the task specific model are explored by Peters [1]:

  1. Freeze context-independent representations from the pre-trained biLM and concatenate them and $ELMo^{task}_{k}$ and pass that into task RNN.
  2. Replacing $h_k$ with $[x_k; ELMo^{task}_{k}]$. Peters [1] has shown improved performance in some tasks such as SNLI and SQuAD by including ELMo at the output of the task RNN.
  3. Add a moderate amount of dropout to ELMo.
  4. Regularize the ELMo weights by adding $\gamma||w||^2_2$ to the loss function.

These can be considered as future enhancment for other domains that may not perform well.

Here is the architecture of ELMo.

Figure 1 - biLM architecture for ELMo


You have to have an environment to use tensorflow, keras, scikit-learn etc.


  1. Tutorial on how to use SDTMMapper to generate mapping specifications
  2. Train your data using SDTMMapper on Model 1: Note that you need to supply your training data.

Comments, Issues:

For any questions, comments, suggestions, or issues, please post them here


Most of the work is done during the weekends or evening. Your contributions are always welcome!

If you want to contribute for adding more models for different SDTM domains, please join PhUSE ML Project Community.


[1] Peters,M et al. (2018). Deep contextualized word representations


The models were build and trained on raw AE datasets from clincial trials conducted by Sunovion Pharmaceuticals. The EDC system we use is Medidata RaveX. The training data contains some e-source data. The performance may not be good for your data. You can also build your models using SDTMMapper tool and use your custom model for your datasets.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for sdtm-mapper, version 0.3.7
Filename, size File type Python version Upload date Hashes
Filename, size sdtm_mapper-0.3.7-py3-none-any.whl (17.8 MB) File type Wheel Python version py3 Upload date Hashes View
Filename, size sdtm_mapper-0.3.7.tar.gz (17.8 MB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page