Skip to main content

CDISC SDTM Mapping Tool

Project description

sdtm-mapper

Sam Tomioka

Feb 2019

About

sdtm-mapper is a Python package to generate machine readable CDISC SDTM mapping specifications with help from AI. This can be used for following tasks.

  1. Generates an empty specifications for training data from a user provided SAS dataset. This empty specification will contain SAS dataset attributes. You don't need to use Proc Contents in SAS to do this! SAS datasets maybe in your aws s3 bucket or local folder.
  2. Runs models to generate a mapping specifications.
  3. Generates your own mapping algorithms using your data. The models can be trained to generate the target variables but also programming sudo code.

The first version comes with three pre-trained models (Included in the package). These are trained on feed forward NN with trainable ELMo embedding layer for 34 classes using adverse event datasets from 18 clinical trials, and validation was done on 3 clinical trials until the models were optimized. Test was done on 1 clinical trial. 22 clinical trials data are extracted from Medidata Rave built by 3 different CROs and Sunovion Pharmaceuticals.

Models Parameters Training Acc Validation Acc Test Acc*
1. Elmo+sfnn+ae+Model1.h5 271,142 0.9795 0.9800 0.9540
2. Elmo+fnn+ae+Model2.h5 664,870 0.9846 1.0000 0.9425
3. Elmo+fnn+ae+Model3.h5 594,854 0.9966 1.0000 0.9666
Table 1 - Performance of three models
* Macro accuracy account for system variables for 'drop'.

High variance models may be due to addition of CDASH metadata, and probably better to remove them.

Improvement of the task specific model are explored by Peters et.al [1]:

  1. Freeze context-independent representations from the pre-trained biLM and concatenate them and $ELMo^{task}_{k}$ and pass that into task RNN.
  2. Replacing $h_k$ with $[x_k; ELMo^{task}_{k}]$. Peters et.al [1] has shown improved performance in some tasks such as SNLI and SQuAD by including ELMo at the output of the task RNN.
  3. Add a moderate amount of dropout to ELMo.
  4. Regularize the ELMo weights by adding $\gamma||w||^2_2$ to the loss function.

These can be considered as future enhancment for other domains that may not perform well.

Here is the architecture of ELMo.

Figure 1 - biLM architecture for ELMo

Installation

pip install sdtm-mapper

Tutorials

  1. Tutorial on how to use sdtm-mapper to generate mapping specifications Try this on Colab!
  2. Train your data using SDTMMapper on Model 1: Note that you need to supply your training data.

Notes

You have to have an environment to use tensorflow, tensorflow-hub etc.

If you want to contribute for adding more models for different SDTM domains, please join PhUSE ML Project Community. Most of the work has been done during the weekends or evening. Your contributions are always welcome!

Notes about the trained models:

The models were build and trained on raw AE datasets from clincial trials conducted by Sunovion Pharmaceuticals. The EDC system we use is Medidata RaveX. The training data contains some e-source data. The performance may not be good for your data. You can also build your models using SDTMMapper tool and use your custom model for your datasets.

Old reame file is found here

Issues

For any questions, comments, suggestions, or issues, please post them here

For personal communication related to SDTMMapper, please contact Sam Tomioka

Disclaimer

This is not an official Sunovion Pharmaceuticals product.

References

1] Peters,M et al. (2018). Deep contextualized word representations

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sdtm_mapper-0.3.8.tar.gz (17.8 MB view details)

Uploaded Source

Built Distribution

sdtm_mapper-0.3.8-py3-none-any.whl (17.8 MB view details)

Uploaded Python 3

File details

Details for the file sdtm_mapper-0.3.8.tar.gz.

File metadata

  • Download URL: sdtm_mapper-0.3.8.tar.gz
  • Upload date:
  • Size: 17.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.26.0 CPython/3.7.0

File hashes

Hashes for sdtm_mapper-0.3.8.tar.gz
Algorithm Hash digest
SHA256 49433833fb28897585d37c76c24f4fb79d04e38a42189870fec1d4022c75e918
MD5 fe0a957b49ba35a1c35b1633e23e43a7
BLAKE2b-256 e118a81ce9c7db192527de8a88091c20418f7cc255df3aca516521d9b029e259

See more details on using hashes here.

File details

Details for the file sdtm_mapper-0.3.8-py3-none-any.whl.

File metadata

  • Download URL: sdtm_mapper-0.3.8-py3-none-any.whl
  • Upload date:
  • Size: 17.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.26.0 CPython/3.7.0

File hashes

Hashes for sdtm_mapper-0.3.8-py3-none-any.whl
Algorithm Hash digest
SHA256 6c7ee27d2d3d4cd5c1212ac684eb82919613c123b1f816163e658fc4b64891cf
MD5 99a7ee77450f17943684a0e06bda5804
BLAKE2b-256 fcda0d90c9056f7dfe902787618fbe10a7736c4ef91a52a5a14267672406d4a4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page