Deep learning utility library for natural language processing that aids in feature engineering and embedding layers.
Project description
DeepZensols Natural Language Processing
Deep learning utility library for natural language processing that aids in feature engineering and embedding layers (see the full documentation).
Features:
- Configurable layers with little to no need to write code.
- Natural language specific layers:
- Easy to configure N deep convolution layer with automatic dimensionality calculation and configurable pooling and batch centering.
- Full Embedding+BiLSTM-CRF implementation using easy to configure constituent layers.
- NLP specific vectorizers that generated zensols deeplearn encoded and decoded batched tensors for spaCy parsed features, dependency tree features, overlapping text features and others.
- Easily swapable during runtime embedded layers as batched tensors and other linguistic vectorized features.
- Support and easily configurable word embeddings for Glove, Word2Vec, fastText and BERT.
- Support for token, document and embedding level vectorized features.
- Integration with Pandas data frames from data ingestion.
- Two full documented examples provided as both command line and Jupyter notebooks.
Documentation
See the full documentation.
Obtaining
The easiest way to install the command line program is via the pip
installer:
pip3 install zensols.deepnlp
Binaries are also available on pypi.
Usage and Examples
If you're in a rush, you can dive right in to the Movie Review Sentiment example, which is a working project that uses this library. However, you'll either end up reading up on the zensols deeplearn library before or during the tutorial.
The usage of this library is explained in terms of two examples:
-
The Movie Review Sentiment trained and tested on the Stanford movie review and Cornell sentiment polarity data sets, which assigns a positive or negative score to a natural language movie review by critics. Also see the Jupyter movie notebook.
-
The Named Entity Recognizer trained and tested on the CoNLL 2003 data set to label named entities on natural language text. Also see the Jupyter NER notebook.
Attribution
This project, or example code, uses:
- Gensim for Glove, Word2Vec and fastText word embeddings.
- Huggingface Transformers for BERT contextual word embeddings.
- bcolz for fast read access to word embedding vectors.
- zensols nlparse for feature generation from spaCy parsing.
- zensols deeplearn for deep learning network libraries.
Corpora used include:
Changelog
An extensive changelog is available here.
License
Copyright (c) 2020 Paul Landes
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for zensols.deepnlp-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae8fb44d19f4c44c01e5faa31a0de96a93b6dc84659ed6cf60424d505adab635 |
|
MD5 | f5eab6bc66b8ef69079d8b8b0e0431c0 |
|
BLAKE2b-256 | 448c540b6395c19b1212f5cc5690bfe6ffdf068f5da7dbf0ea4de9a7034bc9f1 |