Skip to main content

A Python package with ready-to-use models for various NLP tasks and text preprocessing utilities. The implementation allows fine-tuning.

Project description

MIM NLP

With this package you can easily use pre-trained models and fine-tune them, as well as create and train your own neural networks.

Below, we list NLP tasks and models that are available:

  • Classification
    • Neural Network
    • SVM
  • Regression
    • Neural Network
  • Seq2Seq
    • Summarization (Neural Network)

It comes with utilities for text pre-processing such as:

  • Text cleaning
  • Lemmatization
  • Deduplication

Installation

We recommend installing with pip.

pip install mim-nlp

The package comes with the following extras (optional dependencies for given modules):

  • svm - simple svm model for classification
  • classifier - classification models: svm, neural networks
  • regressor - regression models
  • preprocessing - cleaning text, lemmatization and deduplication
  • seq2seq - Seq2Seq and Summarizer models

Usage

Examples can be found in the notebooks directory.

Model classes

  • classifier.nn.NNClassifier - Neural Network Classifier
  • classifier.svm.SVMClassifier - Support Vector Machine Classifier
  • classifier.svm.SVMClassifierWithFeatureSelection - SVMClassifier with additional feature selection step
  • regressor.AutoRegressor - regressor based on transformers' Auto Classes
  • regressor.NNRegressor - Neural Network Regressor
  • seq2seq.AutoSummarizer - summarizer based on transformers' Auto Classes

Interface

All the model classes have common interface:

  • fit
  • predict
  • save
  • load

and specific additional methods.

Text pre-processing

  • preprocessing.TextCleaner - define a pipeline for text cleaning, supports concurrent processesing
  • preprocessing.lemmatize - lemmatize text in Polish with Morfeusz
  • preprocessing.Deduplicator - find near-duplicate texts (depending on threshold) with Jaccard index for n-grams

Development

Remember to use a separate environment for each project. Run commands below inside the project's environment.

Dependencies

We use poetry for dependency management. If you have never used it, consult poetry documentation for installation guidelines and basic usage instructions.

poetry install --with dev

To fix the Failed to unlock the collection! error or stuck packages installation, execute the below command:

export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring

Git hooks

We use pre-commit for git hook management. If you have never used it, consult pre-commit documentation for installation guidelines and basic usage instructions.

pre-commit install

There are two hooks available:

  • isort – runs isort for both .py files and notebooks. Fails if any changes are made, so you have to run git add and git commit once again.
  • Strip notebooks – produces stripped versions of notebooks in stripped directory.

Tests

pytest

Linting

We use isort and flake8 along with nbqa to ensure code quality. The appropriate options are set in configuration files. You can run them with:

isort .
nbqa isort notebooks

and

flake8 .
nbqa flake8 notebooks --nbqa-shell

Code formatting

You can run black to format code (including notebooks):

black .

New version release

In order to add the next version of the package to PyPI, do the following steps:

  • First, increment the package version in pyproject.toml.
  • Then build the new version: run poetry build in the root directory.
  • Finally, upload to PyPI: poetry publish (two newly created files).
    • If you get Invalid or non-existent authentication information. error, add PyPI token to poetry: poetry config pypi-token.pypi <my-token>.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mim_nlp-0.2.1.tar.gz (25.5 kB view details)

Uploaded Source

Built Distribution

mim_nlp-0.2.1-py3-none-any.whl (32.3 kB view details)

Uploaded Python 3

File details

Details for the file mim_nlp-0.2.1.tar.gz.

File metadata

  • Download URL: mim_nlp-0.2.1.tar.gz
  • Upload date:
  • Size: 25.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.9 Linux/5.10.102.1-microsoft-standard-WSL2

File hashes

Hashes for mim_nlp-0.2.1.tar.gz
Algorithm Hash digest
SHA256 aa24520e2521ba36a687542a273f5df97ee62cda209dde19218d7b429f1feebb
MD5 a2cd5dd7c129988dbd023906bb1940b9
BLAKE2b-256 5baa4e48359f52e5090eebe05b5e534197135e3559874ffc69dfafd295d17add

See more details on using hashes here.

File details

Details for the file mim_nlp-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: mim_nlp-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 32.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.9 Linux/5.10.102.1-microsoft-standard-WSL2

File hashes

Hashes for mim_nlp-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 47842a0209aa78abf6a94a436a2f1dbfcaf17fbe5faebf1edac42c44cb2982e8
MD5 df07d5b256868e01321fa54e419a7eb6
BLAKE2b-256 99cfa2acd4215607f11298b35b27197f4d2b3834620fbaa665b6d573adeca369

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page