A Python package with ready-to-use models for various NLP tasks and text preprocessing utilities. The implementation allows fine-tuning.
Project description
MIM NLP
With this package you can easily use pre-trained models and fine-tune them, as well as create and train your own neural networks.
Below, we list NLP tasks and models that are available:
- Classification
- Neural Network
- SVM
- Regression
- Neural Network
- Seq2Seq
- Summarization (Neural Network)
It comes with utilities for text pre-processing such as:
- Text cleaning
- Lemmatization
- Deduplication
Installation
We recommend installing with pip.
pip install mim-nlp
The package comes with the following extras (optional dependencies for given modules):
svm- simple svm model for classificationclassifier- classification models: svm, neural networksregressor- regression modelspreprocessing- cleaning text, lemmatization and deduplicationseq2seq-Seq2SeqandSummarizermodels
Usage
Examples can be found in the notebooks directory.
Model classes
classifier.nn.NNClassifier- Neural Network Classifierclassifier.svm.SVMClassifier- Support Vector Machine Classifierclassifier.svm.SVMClassifierWithFeatureSelection-SVMClassifierwith additional feature selection stepregressor.AutoRegressor- regressor based on transformers' Auto Classesregressor.NNRegressor- Neural Network Regressorseq2seq.AutoSummarizer- summarizer based on transformers' Auto Classes
Interface
All the model classes have common interface:
fitpredictsaveload
and specific additional methods.
Text pre-processing
preprocessing.TextCleaner- define a pipeline for text cleaning, supports concurrent processesingpreprocessing.lemmatize- lemmatize text in Polish with Morfeuszpreprocessing.Deduplicator- find near-duplicate texts (depending onthreshold) with Jaccard index for n-grams
Development
Remember to use a separate environment for each project. Run commands below inside the project's environment.
Dependencies
We use poetry for dependency management.
If you have never used it, consult
poetry documentation
for installation guidelines and basic usage instructions.
poetry install --with dev
To fix the Failed to unlock the collection! error or stuck packages installation, execute the below command:
export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring
Git hooks
We use pre-commit for git hook management.
If you have never used it, consult
pre-commit documentation
for installation guidelines and basic usage instructions.
pre-commit install
There are two hooks available:
- isort – runs
isortfor both.pyfiles and notebooks. Fails if any changes are made, so you have to rungit addandgit commitonce again. - Strip notebooks – produces stripped versions of notebooks in
strippeddirectory.
Tests
pytest
Linting
We use isort and flake8 along with nbqa to ensure code quality.
The appropriate options are set in configuration files.
You can run them with:
isort .
nbqa isort notebooks
and
flake8 .
nbqa flake8 notebooks --nbqa-shell
Code formatting
You can run black to format code (including notebooks):
black .
New version release
In order to add the next version of the package to PyPI, do the following steps:
- First, increment the package version in
pyproject.toml. - Then build the new version: run
poetry buildin the root directory. - Finally, upload to PyPI:
poetry publish(two newly created files).- If you get
Invalid or non-existent authentication information.error, add PyPI token to poetry:poetry config pypi-token.pypi <my-token>.
- If you get
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mim_nlp-0.2.1.tar.gz.
File metadata
- Download URL: mim_nlp-0.2.1.tar.gz
- Upload date:
- Size: 25.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.10.9 Linux/5.10.102.1-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa24520e2521ba36a687542a273f5df97ee62cda209dde19218d7b429f1feebb
|
|
| MD5 |
a2cd5dd7c129988dbd023906bb1940b9
|
|
| BLAKE2b-256 |
5baa4e48359f52e5090eebe05b5e534197135e3559874ffc69dfafd295d17add
|
File details
Details for the file mim_nlp-0.2.1-py3-none-any.whl.
File metadata
- Download URL: mim_nlp-0.2.1-py3-none-any.whl
- Upload date:
- Size: 32.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.10.9 Linux/5.10.102.1-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
47842a0209aa78abf6a94a436a2f1dbfcaf17fbe5faebf1edac42c44cb2982e8
|
|
| MD5 |
df07d5b256868e01321fa54e419a7eb6
|
|
| BLAKE2b-256 |
99cfa2acd4215607f11298b35b27197f4d2b3834620fbaa665b6d573adeca369
|