This package aims to do text classification of clinical notes.
Project description
# Text classification of clinical notes
----
Clinical notes usually contain crucial medical information of patients, such as medications, history, physical examinations, and so on. This project aims to classify short clinical notes to its corresponding category by using Navie Bayes algorithm and SVM. The default categories are **Medications**, **Hospital Course**, **Laboratories**, **Physical Examinations** and **History**.
This is a sample of Physical Examination:
> Physical examination revealed temperature was 96.9 , heart rate was 121 , blood pressure was 122/86 , respiratory rate was 22 , and oxygen saturation was 96% on room air . In general , the patient looked acutely and chronically ill . Head , eyes , ears , nose , and throat examination revealed the oropharynx was clear . Sclerae were anicteric . Mucous membranes were moist . Cardiovascular examination revealed tachycardic first heart sound and second heart sound . No murmurs , rubs , or gallops . Lungs revealed decreased breath sounds and dullness to percussion in the left lung base . The abdomen was distended and firm . Positive bowel sounds . Extremity examination revealed no clubbing , cyanosis , or edema .
And the following is an example of Medication paragraph:
>Atrovent . Aspirin . Flonase . Quinine Celebrex .Compazine . Oxybutynin Amitriptyline . Zyrtec . Prozac . Trazodone . Humulin 70 units bid . Albuterol . Lasix 40 PO bid . Triamcinolone cream . Miconazole cream Flovent . Nifedipine ER 60 daily . Lisinopril , 30 daily . KCl 10 mEq q day . Protonix , 40 daily . Lipitor , 20 daily . Methadone 10 bid .
## Usage
Pre-trained model is available. It is very easy to use.
```python
>>> import CNClassifier
>>> note="This is a short note describing a patient's information about history illness"
>>> clsf=CNClassifier.classifier()
>>> print(clsf.letspredict(note))
History
```
You could also choose different features for classifier.
> classifier(datanum, dataset,labels_index,labels_name,algo,feature,tfidf=0)
- datanum : integer, 1-10, the proportion of the dataset used for training classifier model. For - - instance: datanum=7, 70% dataset would be used in training set
- feature: string, feature="bow", BOW ; feature="skip-gram", Skip-gram; feature="cbow", CBOW;
- tfidf: 0 or 1, when tfidf=1, tfidf would be used while 0 means not use
- dataset: string, the directory of dataset
- labels_index: list, labels or tags for each documents
- labels_nameL list, each label's corresponding category
- algo: string, "mult_nb": multinomial NB, "line_svm": linear SVM
## Training
If you want to train your own dataset:
> classifier_model=classifier(dataset="Your dataset directory",labels_index=['your','labels'],labels_name=['your','category'])
The format of the dataset document should be:
```
label_index1 This is the first notes of category 1,
label_index1 This is the second note of category 1,
label_index2 I am the first one in category 2,
label_index3 I am in category 3,
```
## License
```
Copyright [2018-2019] [Wei Ruan]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
```
----
Clinical notes usually contain crucial medical information of patients, such as medications, history, physical examinations, and so on. This project aims to classify short clinical notes to its corresponding category by using Navie Bayes algorithm and SVM. The default categories are **Medications**, **Hospital Course**, **Laboratories**, **Physical Examinations** and **History**.
This is a sample of Physical Examination:
> Physical examination revealed temperature was 96.9 , heart rate was 121 , blood pressure was 122/86 , respiratory rate was 22 , and oxygen saturation was 96% on room air . In general , the patient looked acutely and chronically ill . Head , eyes , ears , nose , and throat examination revealed the oropharynx was clear . Sclerae were anicteric . Mucous membranes were moist . Cardiovascular examination revealed tachycardic first heart sound and second heart sound . No murmurs , rubs , or gallops . Lungs revealed decreased breath sounds and dullness to percussion in the left lung base . The abdomen was distended and firm . Positive bowel sounds . Extremity examination revealed no clubbing , cyanosis , or edema .
And the following is an example of Medication paragraph:
>Atrovent . Aspirin . Flonase . Quinine Celebrex .Compazine . Oxybutynin Amitriptyline . Zyrtec . Prozac . Trazodone . Humulin 70 units bid . Albuterol . Lasix 40 PO bid . Triamcinolone cream . Miconazole cream Flovent . Nifedipine ER 60 daily . Lisinopril , 30 daily . KCl 10 mEq q day . Protonix , 40 daily . Lipitor , 20 daily . Methadone 10 bid .
## Usage
Pre-trained model is available. It is very easy to use.
```python
>>> import CNClassifier
>>> note="This is a short note describing a patient's information about history illness"
>>> clsf=CNClassifier.classifier()
>>> print(clsf.letspredict(note))
History
```
You could also choose different features for classifier.
> classifier(datanum, dataset,labels_index,labels_name,algo,feature,tfidf=0)
- datanum : integer, 1-10, the proportion of the dataset used for training classifier model. For - - instance: datanum=7, 70% dataset would be used in training set
- feature: string, feature="bow", BOW ; feature="skip-gram", Skip-gram; feature="cbow", CBOW;
- tfidf: 0 or 1, when tfidf=1, tfidf would be used while 0 means not use
- dataset: string, the directory of dataset
- labels_index: list, labels or tags for each documents
- labels_nameL list, each label's corresponding category
- algo: string, "mult_nb": multinomial NB, "line_svm": linear SVM
## Training
If you want to train your own dataset:
> classifier_model=classifier(dataset="Your dataset directory",labels_index=['your','labels'],labels_name=['your','category'])
The format of the dataset document should be:
```
label_index1 This is the first notes of category 1,
label_index1 This is the second note of category 1,
label_index2 I am the first one in category 2,
label_index3 I am in category 3,
```
## License
```
Copyright [2018-2019] [Wei Ruan]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
CNClassifier-1.0.4.tar.gz
(4.5 kB
view details)
Built Distribution
File details
Details for the file CNClassifier-1.0.4.tar.gz
.
File metadata
- Download URL: CNClassifier-1.0.4.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/28.8.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3aa13aad39f7c7c0b320616514f03578bbe8f6b874aa4e0bf77dee9df6f45f8b |
|
MD5 | 166b571c398b02b9cc5ae48ae74cb450 |
|
BLAKE2b-256 | 285d2181ad369155cacd2c0fa76cdb51906593c45df8ca957b87cc18ce7a8746 |
File details
Details for the file CNClassifier-1.0.4-py3-none-any.whl
.
File metadata
- Download URL: CNClassifier-1.0.4-py3-none-any.whl
- Upload date:
- Size: 5.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/28.8.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7aa4e59588848ac23c8ace93b23fecd740e374a73cce987aaf695d171693883d |
|
MD5 | 7e835f0a692144d574ac74caf83d631e |
|
BLAKE2b-256 | e4f18292d3f9b5a9efc5976c1e913349e9561515e7cf954fc5068976e26e1386 |