Path signatures for Natural Language Processing.
Project description
nlpsig
NLPSig (nlpsig
) is a Python package for constructing streams/paths of
embeddings obtained from transformers. The key contributions are:
- A simple API for taking streams of textual data and constructing streams of
embeddings from transformers
- The
nlpsig.SentenceEncoder
andnlpsig.TextEncoder
classes allow you to pass in a corpus of text data (in a variety of formats) and obtain corresponding embeddings using thesentence-transformer
and HuggingFacetransformers
libraries, respectively. - The
nlpsig.PrepareData
allows you to easily construct paths/streams of embeddings which can be used for several downstream tasks.
- The
- Simple API for performing dimensionality reduction with
nlpsig.DimReduce
on the embeddings obtained from transformers by some simple wrappers over popular dimensionality reduction algorithms such as PCA, UMAP, t-SNE, etc.- This is particularly useful if we wish to use path signatures in any downstream model since the dimensionality of the embeddings obtained from transformers is usually very high.
- We present some Signature Network models for longitudinal NLP tasks in the
sig-networks
library which uses these paths constructed in this library as inputs to neural networks which utilise path signature methodology.
- We also have
simple classes
for constructing train/test splits of the data and for K-fold cross-validation
in which are general and are applied to examples in the Signature Networks in
the
sig-networks
library.
Installation
NLPSig is available on PyPI and can be installed with pip
:
pip install nlpsig
Contributing
To take advantage of pre-commit
, which will automatically format your code and
run some basic checks before you commit:
pip install pre-commit # or brew install pre-commit on macOS
pre-commit install # will install a pre-commit hook into the git repo
After doing this, each time you commit, some linters will be applied to format
the codebase. You can also/alternatively run pre-commit run --all-files
to run
the checks.
See CONTRIBUTING.md for more information on running the test
suite using nox
.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file nlpsig-0.2.2.tar.gz
.
File metadata
- Download URL: nlpsig-0.2.2.tar.gz
- Upload date:
- Size: 48.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52a7a617e2668ca4d846c0facdbc3c96a437b6ce593a6ca78a243ed761017357 |
|
MD5 | 3298e7dd16541eddfca5cc61b17a3d07 |
|
BLAKE2b-256 | bef3cf7844db4470e00fe8f1d34db63235cafe1e396b1108310c94ff719cef59 |
File details
Details for the file nlpsig-0.2.2-py3-none-any.whl
.
File metadata
- Download URL: nlpsig-0.2.2-py3-none-any.whl
- Upload date:
- Size: 34.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a328db69729bb9ea8cb35dd9d9becc8bbb602cad12e281bb2ac8f5a0cef91fa2 |
|
MD5 | c36e77d53ff5a32cfb7c99a8de53a663 |
|
BLAKE2b-256 | eb2e6d8ebf55560e6703fd161bcd71af0060525b016c67e32d00f927703cbade |