Text augmentation library for NLP with a focus on biomedical applications.
Project description
Augmentext
Augmentext is a text augmentation package for Natural Language Processing, with a focus on applications in the biomedical domain.
Augmentext is work in progress! Some features are functional, but it not yet in a usable state.
Features
- Auto-generated, randomised misspellings
- Dictionary-based thesaurus word replacement
- Auto-generated abbreviations
- More to come...
Biomedical Domain Specific Features
Although a general library, Augmentext has a special focus on biomedical text, such as
- Replacement of mm/g^2 with common mistakes, e.g. g/mm^2 etc.
- Conversion of units from metric to imperial/customary and vice versa
- Integration of SNOMED, ICD, MeSH, RxNorm and other text corpora in to the augmentation pipeline
- Synonym replacement using pre-trained models using GloVe, fasttext, and word2vec.
More Information
See the project's GitHub respository https://github.com/mdbloice/Augmentext
Help will be available here once the software has been made public on GitHub: https://augmentext.readthedocs.io
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Augmentext-0.1.7.tar.gz
(25.3 kB
view hashes)
Built Distribution
Augmentext-0.1.7-py3-none-any.whl
(25.0 kB
view hashes)
Close
Hashes for Augmentext-0.1.7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b6a29ec2353ab4f791f7ac94cf861462b6a67cde271ee8e70a5ed36c9e49ce4 |
|
MD5 | 3919953f3ab72ed7aaee1da3b20ac89c |
|
BLAKE2b-256 | 81551df3916cf4b6d6d3c34bddc4324390c212066f120cf8dfb954dc12fb9dbb |