CDISC SDTM Mapping Tool
Project description
sdtm-mapper
Sam Tomioka
Feb 2019
About
sdtm-mapper is a Python package to generate machine readable CDISC SDTM mapping specifications with help from AI. This can be used for following tasks.
- Generates an empty specifications for training data from a user provided SAS dataset. This empty specification will contain SAS dataset attributes. You don't need to use
Proc Contents
in SAS to do this! SAS datasets maybe in your aws s3 bucket or local folder. - Runs models to generate a mapping specifications.
- Generates your own mapping algorithms using your data. The models can be trained to generate the target variables but also programming sudo code.
The first version comes with three pre-trained models (Included in the package). These are trained on feed forward NN with trainable ELMo embedding layer for 34 classes using adverse event datasets from 18 clinical trials, and validation was done on 3 clinical trials until the models were optimized. Test was done on 1 clinical trial. 22 clinical trials data are extracted from Medidata Rave built by 3 different CROs and Sunovion Pharmaceuticals.
Models | Parameters | Training Acc | Validation Acc | Test Acc* |
---|---|---|---|---|
1. Elmo+sfnn+ae+Model1.h5 | 271,142 | 0.9795 | 0.9800 | 0.9540 |
2. Elmo+fnn+ae+Model2.h5 | 664,870 | 0.9846 | 1.0000 | 0.9425 |
3. Elmo+fnn+ae+Model3.h5 | 594,854 | 0.9966 | 1.0000 | 0.9666 |
Table 1 - Performance of three models |
||||
* Macro accuracy account for system variables for 'drop'. |
High variance models may be due to addition of CDASH metadata, and probably better to remove them.
Improvement of the task specific model are explored by Peters et.al [1]:
- Freeze context-independent representations from the pre-trained biLM and concatenate them and $ELMo^{task}_{k}$ and pass that into task RNN.
- Replacing $h_k$ with $[x_k; ELMo^{task}_{k}]$. Peters et.al [1] has shown improved performance in some tasks such as SNLI and SQuAD by including ELMo at the output of the task RNN.
- Add a moderate amount of dropout to ELMo.
- Regularize the ELMo weights by adding $\gamma||w||^2_2$ to the loss function.
These can be considered as future enhancment for other domains that may not perform well.
Here is the architecture of ELMo.
Figure 1 - biLM architecture for ELMo
Installation
pip install sdtm-mapper
Tutorials
- Tutorial on how to use sdtm-mapper to generate mapping specifications Try this on Colab!
- Train your data using SDTMMapper on Model 1: Note that you need to supply your training data.
Notes
You have to have an environment to use tensorflow, tensorflow-hub etc.
If you want to contribute for adding more models for different SDTM domains, please join PhUSE ML Project Community. Most of the work has been done during the weekends or evening. Your contributions are always welcome!
Notes about the trained models:
The models were build and trained on raw AE datasets from clincial trials conducted by Sunovion Pharmaceuticals. The EDC system we use is Medidata RaveX. The training data contains some e-source data. The performance may not be good for your data. You can also build your models using SDTMMapper tool and use your custom model for your datasets.
Old reame file is found here
Issues
For any questions, comments, suggestions, or issues, please post them here
For personal communication related to SDTMMapper, please contact Sam Tomioka
Disclaimer
This is not an official Sunovion Pharmaceuticals product.
References
1] Peters,M et al. (2018). Deep contextualized word representations
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file sdtm_mapper-0.3.8.tar.gz
.
File metadata
- Download URL: sdtm_mapper-0.3.8.tar.gz
- Upload date:
- Size: 17.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.26.0 CPython/3.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 49433833fb28897585d37c76c24f4fb79d04e38a42189870fec1d4022c75e918 |
|
MD5 | fe0a957b49ba35a1c35b1633e23e43a7 |
|
BLAKE2b-256 | e118a81ce9c7db192527de8a88091c20418f7cc255df3aca516521d9b029e259 |
File details
Details for the file sdtm_mapper-0.3.8-py3-none-any.whl
.
File metadata
- Download URL: sdtm_mapper-0.3.8-py3-none-any.whl
- Upload date:
- Size: 17.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.26.0 CPython/3.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c7ee27d2d3d4cd5c1212ac684eb82919613c123b1f816163e658fc4b64891cf |
|
MD5 | 99a7ee77450f17943684a0e06bda5804 |
|
BLAKE2b-256 | fcda0d90c9056f7dfe902787618fbe10a7736c4ef91a52a5a14267672406d4a4 |