ctproc

library for processing clinical trials data from clinicaltrials.gov

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Project Description & API

This is a library for processing clinical trials data from clinicaltrials.gov It offers methods for parsing the XML and content fields of the documents.
The main api is through the process_data method, with default values shown here:

from ctproc import CTConfig, CTProc

zip_data   = "/path/to/zip_folder"
write_file = "/path/to/write/file.jsonl"

id_ = 'NCT00001444'
config = CTConfig(
    zip_data=zip_data, 
    write_file=write_file,
    id_to_print=id_, 
    max_trials=25,
    add_nlp=True       # must have en_core_sci_md spaCy model installed
)

cp = ClinProc(config)
id2doc = {res.nct_id : res for res in cp.process_data()}
id_doc = id2doc[id_]

print(id_doc.elig_crit.include_criteria)

Output will be .jsonl format in that write location, one processed document per line. This uses Zipfile so you don't have to uncompress your data. Some usefule features are the text processing utilities built into the process_data routine.

spaCy's pipeline for text processing, is leveraged greatly, for entity linking, sentence segmentation, alias expansion, and negation.

The field of utility to many is the 'eligibility/criteria/textblock` field, where the eligbility criteria are given in a somewhat structured block of text like shown below.

     Inclusion Criteria:
         -  Patients with HF or IHD who are not currently taking the study medications of
            interest (ACE inhibitors/angiotensin receptor blockers for HF or statins for IHD) and
            whose primary care physicians are part of the study population
     Exclusion Criteria:
         -  Patients who are unable or unwilling to give informed consent,
         -  previously taken the study medications according to dispensing records
         -  allergy or intolerance to study medications
         -  residents of long-term care facilities
         -  unable to confirm a diagnosis of either HF or IHD
         -  primary care physician has already contributed 5 patients to the study

Installation

You can use pip to install,

pip install ctproc

But due to pypi limitations to not including linked libraries, you will need to install the spaCy en_core_sci_md model like:

pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_md-0.5.1.tar.gz

*Note that the initial import ctproc will take a few minutes due to having to load the scispacy model.

What it Does:

In particular the library methods break the eligbility/criteria/textblock block of text into inclusion and exclusion criteria, for further processing. This works in most cases but does break on difficult structures of this field where there are conditions of exclusion and inclusion mixed in with one another. It's also possible the structure could change entirely and other fields will come; this project is not affiliated with clinicaltrials.gov in any way.

There are a number of different representations the method process_data() will return inside the processed document, turned on by default unless args are specified like:

concatenation into a single field of a user selected set of fields and subfields
mapping to UMLS CUI values: https://www.nlm.nih.gov/research/umls/index.html
alias expansion from raw text associated with linked CUI values, with an attempt to maintain sentence structure
an attempt at moving of negation in one criteria or the other to the oppsing field (inc -> exc, exc -> inc)
removal of stopword or a list of words from the contents field constructed by the concatenation methods

TODO:

construct a module to identify labs and ranges in the criteria data (to be used by ctmatch to match with values in the patient descriptions)

https://github.com/semajyllek/ctproc

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.2.3

Mar 22, 2023

0.2.2

Feb 28, 2023

0.2.1

Feb 27, 2023

0.2.0

Feb 27, 2023

0.1.3

Jan 29, 2023

0.1.2

Jan 29, 2023

0.1.1

Jan 29, 2023

0.1.0

Jan 24, 2023

0.0.1

Jan 23, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ctproc-0.2.3.tar.gz (40.9 kB view details)

Uploaded Mar 22, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ctproc-0.2.3-py3-none-any.whl (45.6 kB view details)

Uploaded Mar 22, 2023 Python 3

File details

Details for the file ctproc-0.2.3.tar.gz.

File metadata

Download URL: ctproc-0.2.3.tar.gz
Upload date: Mar 22, 2023
Size: 40.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.7

File hashes

Hashes for ctproc-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`91105c0eac67f955c10cd64510b99c06be19ed01f58ea8cd2a67931d9e237dae`
MD5	`2c51f6e4ffc0c36fd95b0c76d12502b1`
BLAKE2b-256	`cceb63b5246dbc2279118b78d33fe9ccecab396a30a09981ed358675ab2ee23a`

See more details on using hashes here.

File details

Details for the file ctproc-0.2.3-py3-none-any.whl.

File metadata

Download URL: ctproc-0.2.3-py3-none-any.whl
Upload date: Mar 22, 2023
Size: 45.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.7

File hashes

Hashes for ctproc-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`64b0d583a5f0e8f93861786fde24cdaedd3b6d8ed88ec0860892c3af606846ee`
MD5	`924fe0e434404652757d61c0a1d652fc`
BLAKE2b-256	`7afe4519c3dfd8db17b4a8ae5092f2d409c4344374e2557211804cd3dae7a874`

See more details on using hashes here.

ctproc 0.2.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Project Description & API

Installation

What it Does:

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes