Skip to main content

This package aims to do text classification of clinical notes.

Project description

# Text classification of clinical notes
----
Clinical notes usually contain crucial medical information of patients, such as medications, history, physical examinations, and so on. This project aims to classify short clinical notes to its corresponding category by using Navie Bayes algorithm and SVM. The default categories are **Medications**, **Hospital Course**, **Laboratories**, **Physical Examinations** and **History**.

This is a sample of Physical Examination:
> Physical examination revealed temperature was 96.9 , heart rate was 121 , blood pressure was 122/86 , respiratory rate was 22 , and oxygen saturation was 96% on room air . In general , the patient looked acutely and chronically ill . Head , eyes , ears , nose , and throat examination revealed the oropharynx was clear . Sclerae were anicteric . Mucous membranes were moist . Cardiovascular examination revealed tachycardic first heart sound and second heart sound . No murmurs , rubs , or gallops . Lungs revealed decreased breath sounds and dullness to percussion in the left lung base . The abdomen was distended and firm . Positive bowel sounds . Extremity examination revealed no clubbing , cyanosis , or edema .

And the following is an example of Medication paragraph:
>Atrovent . Aspirin . Flonase . Quinine Celebrex .Compazine . Oxybutynin Amitriptyline . Zyrtec . Prozac . Trazodone . Humulin 70 units bid . Albuterol . Lasix 40 PO bid . Triamcinolone cream . Miconazole cream Flovent . Nifedipine ER 60 daily . Lisinopril , 30 daily . KCl 10 mEq q day . Protonix , 40 daily . Lipitor , 20 daily . Methadone 10 bid .

## Usage
Pre-trained model is available. It is very easy to use.
```python
>>> import CNClassifier
>>> note="This is a short note describing a patient's information about history illness"
>>> clsf=CNClassifier.classifier()
>>> print(clsf.letspredict(note))
History
```
You could also choose different features for classifier.
> classifier(datanum, dataset,labels_index,labels_name,algo,feature,tfidf=0)
- datanum : integer, 1-10, the proportion of the dataset used for training classifier model. For - - instance: datanum=7, 70% dataset would be used in training set
- feature: string, feature="bow", BOW ; feature="skip-gram", Skip-gram; feature="cbow", CBOW;
- tfidf: 0 or 1, when tfidf=1, tfidf would be used while 0 means not use
- dataset: string, the directory of dataset
- labels_index: list, labels or tags for each documents
- labels_nameL list, each label's corresponding category
- algo: string, "mult_nb": multinomial NB, "line_svm": linear SVM

## Training
If you want to train your own dataset:
> classifier_model=classifier(dataset="Your dataset directory",labels_index=['your','labels'],labels_name=['your','category'])


The format of the dataset document should be:
```
label_index1 This is the first notes of category 1,
label_index1 This is the second note of category 1,
label_index2 I am the first one in category 2,
label_index3 I am in category 3,
```
## License
```
Copyright [2018-2019] [Wei Ruan]

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
```



Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

CNClassifier-1.0.4.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

CNClassifier-1.0.4-py3-none-any.whl (5.4 kB view details)

Uploaded Python 3

File details

Details for the file CNClassifier-1.0.4.tar.gz.

File metadata

  • Download URL: CNClassifier-1.0.4.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/28.8.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.2

File hashes

Hashes for CNClassifier-1.0.4.tar.gz
Algorithm Hash digest
SHA256 3aa13aad39f7c7c0b320616514f03578bbe8f6b874aa4e0bf77dee9df6f45f8b
MD5 166b571c398b02b9cc5ae48ae74cb450
BLAKE2b-256 285d2181ad369155cacd2c0fa76cdb51906593c45df8ca957b87cc18ce7a8746

See more details on using hashes here.

File details

Details for the file CNClassifier-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: CNClassifier-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/28.8.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.2

File hashes

Hashes for CNClassifier-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 7aa4e59588848ac23c8ace93b23fecd740e374a73cce987aaf695d171693883d
MD5 7e835f0a692144d574ac74caf83d631e
BLAKE2b-256 e4f18292d3f9b5a9efc5976c1e913349e9561515e7cf954fc5068976e26e1386

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page