Skip to main content

a rule-based clinical concept extraction tool to capture microorganisms and estimate infection status on semi-structured microbiology culture reports.

Project description

Version Documentation Maintenance License:MIT

MicrobEx (Microbiology Concept Extractor):

This code was developed to provide an open-source python package to extract clinical concepts from free-text semi-structured microbiology reports. The two primary outputs for this package are (1) an binary estimation of patient bacterial infection status and (2) a list of all clinically relevant microorganisms found in the report. These outputs were validated on two independent datasets and achieved f-1 scores over 0.95 on both outputs when compared to expert review. Full details on background, algorithm, and validation results can be seen at our paper here: (currently being written, will update once submitted to archive).

🏠 Homepage

package

Requirements

* python >=3.6.8
* pandas >=0.25.0

Install

pip install microbex

Usage

instantiation:

def init(self, data: pd.core.frame.DataFrame, ###check if this requirement works. can work on this late. text_col: str, #previously text_col_main culture_id_col: str, #previously culture_id_main visit_id_col: str, #previously visit_id_main ):

the microbex class instantiation takes in a pandas dataframe with 3 expected columns (colnames are provided as kwargs):

  • parsed_note (kwarg: text_col):
    • microbiology report txt in either a raw or (**perferable) chopped up into components (eg gram stain/growth report/ab susceptability)
  • culture_id (kwarg: culture_id_col):
    • a primary key tied to a given sample/specimen + microbiological exam order.
    • Often a microbiology order can be tied to numerous components (eg gram stain/growth report/ ab susceptability). additionally these can be appended to same report or added as a new report tied to same sample + order. all of these tied to a sample+order should share same culture_id
  • visit_id (kwarg: visit_id_col):
    • primary key for patient's visit/encounter
    • can be 1-many:1 to culture_id or 1:1 (in which case can specify as culture_id)
    • in some datasets a patient may have multiple cultures performed in a visit/encounter.

Inline:

import microbex as me
d={'parsed_note': 'No Salmonella, Shigella, Campylobacter, Aeromonas or Plesiomonas isolated.', 'culture_id': 1, 'visit_id': 1}
df=pd.DataFrame(data=d, index=[1])

obj1= me.Microbex(df,
              text_col='parsed_note',
              culture_id_col='culture_id',
              visit_id_col='visit_id')

## see microbex.annotate() docstring for description of kwargs
obj1.annotate(staph_neg_correction=False, 
              specimen_col=None,
              review_suggestions=False,
              likelyneg_block_skip=False
             )

print(obj1.annotated_data.head())

obj1.annotated_data.to_pickle("<designated_save_path>'.pkl")
#note: while annotated_data can be saved as a csv, there are some columns which are made of lists in each cell. the formatting of these can sometimes not interpreted correctly.
## pkl files preserve dtype and resolve this issue. 

Run tests

commandline

  • this test compares a freshly annotated sample_dataset with an imported pre-annotated expected version.
cd microbex
pytest -v

Author

👤 Garrett Eickelberg

🤝 Contributing

Contributions, issues and feature requests are welcome!
Feel free to check issues page. You can also take a look at the contributing guide

Show your support

Give a ⭐️ if this project helped you!

Credits

Markdown Readme Generator

📝 License

This project is MIT licensed.


This README was created with the markdown-readme-generator

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

microbex-0.0.3.tar.gz (33.3 kB view hashes)

Uploaded Source

Built Distribution

microbex-0.0.3-py3-none-any.whl (34.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page