long document classification with language models

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language
- Python :: 3.5
Topic
- Text Processing :: Linguistic

Project description

:book: BERT Long Document Classification :book:

an easy-to-use interface to fully trained BERT based models for multi-class and multi-label long document classification.

pre-trained models are currently available for two clinical note (EHR) phenotyping tasks: smoker identification and obesity detection.

To sustain future development and improvements, we interface pytorch-transformers for all language model components of our architectures. Additionally, their is a blog post describing the architecture.

Model	Dataset	# Labels	Evaluation F1
n2c2_2006_smoker_lstm	I2B2 2006: Smoker Identification	4	0.981
n2c2_2008_obesity_lstm	I2B2 2008: Obesity and Co-morbidities Identification	15	0.997

Installation

Install with pip:

pip install bert_document_classification

or directly:

pip install git+https://github.com/AndriyMulyar/bert_document_classification

Use

Maps text documents of arbitrary length to binary vectors indicating labels.

from bert_document_classification.models import SmokerPhenotypingBert
from bert_document_classification.models import ObesityPhenotypingBert

smoking_classifier = SmokerPhenotypingBert(device='cuda', batch_size=10) #defaults to GPU prediction

obesity_classifier = ObesityPhenotypingBert(device='cpu', batch_size=10) #or CPU if you would like.

smoking_classifier.predict(["I'm a document! Make me long and the model can still perform well!"])

More examples.

Notes

For training you will need a GPU.
For bulk inference where speed is not of concern lots of available memory and CPU cores will likely work.
Model downloads are cached in ~/.cache/torch/bert_document_classification/. Try clearing this folder if you have issues.

Acknowledgement

If you found this project useful, consider citing our extended abstract accepted at NeurIPS 2019 ML4Health .

Format bibtex citation

Implementation, development and training in this project were supported by funding from the Mark Dredze Lab at Johns Hopkins University.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language
- Python :: 3.5
Topic
- Text Processing :: Linguistic

Release history Release notifications | RSS feed

This version

1.0.0

Oct 6, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bert_document_classification-1.0.0.tar.gz (16.3 kB view details)

Uploaded Oct 6, 2019 Source

Built Distribution

bert_document_classification-1.0.0-py3-none-any.whl (18.7 kB view details)

Uploaded Oct 6, 2019 Python 3

File details

Details for the file bert_document_classification-1.0.0.tar.gz.

File metadata

Download URL: bert_document_classification-1.0.0.tar.gz
Upload date: Oct 6, 2019
Size: 16.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.8

File hashes

Hashes for bert_document_classification-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`74e91b3932fa34cb9008170d57c219e65a0178b800ea6928f601c6153f193450`
MD5	`3d1a7e85dd8fb3e5709e3a34f6e2317b`
BLAKE2b-256	`04cf7d774c7b9eef0f0f8299ca0a3942133c1460d9a6262e6eb0ccb07f90419d`

See more details on using hashes here.

File details

Details for the file bert_document_classification-1.0.0-py3-none-any.whl.

File metadata

Download URL: bert_document_classification-1.0.0-py3-none-any.whl
Upload date: Oct 6, 2019
Size: 18.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.8

File hashes

Hashes for bert_document_classification-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4d4559fa8e15d2fb800cedfdc79c14266d7b325c31ed084564ddec3707217480`
MD5	`ceebce09c73cabbd6a834976d6fbffc0`
BLAKE2b-256	`f9e0bfce41dcb17179d538c46093e04a8925b63c913dae9a269aca51b0e2d701`

See more details on using hashes here.

bert-document-classification 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

:book: BERT Long Document Classification :book:

Installation

Use

Notes

Acknowledgement

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes