Skip to main content

Out of the Box, Easy to use sequence to sequence library with attention for text.

Project description

txt2txt - An extremely easy to use seq2seq implementation with Attention for text to text use cases

DOI

Examples

  1. Adding two numbers
  2. More Complex Math and fit_generator

Installation

pip install txt2txt

Training a model

from txt2txt import build_params, build_model, convert_training_data

input_data = ['123', '213', '312', '321', '132', '231']
output_data = ['123', '123', '123', '123', '123', '123']

build_params(input_data = input_data, output_data = output_data, params_path = 'test/params', max_lenghts=(10, 10))

model, params = build_model(params_path='test/params')

input_data, output_data = convert_training_data(input_data, output_data, params)

checkpoint = ModelCheckpoint('test/checkpoint', monitor='val_acc', verbose=1, save_best_only=True, mode='max')
callbacks_list = [checkpoint]

model.fit(input_data, output_data, validation_data=(input_data, output_data), batch_size=2, epochs=20, callbacks=callbacks_list)

Loading a trained model and running inference

from txt2txt import build_model, infer
model, params = build_model(params_path='test/params')
model.load_weights('path_to_checkpoint_file')
infer(input_text, model, params)

Note: Checkout https://github.com/bedapudi6788/deepcorrect for pre-trained models for english punctuation correction and grammar correction.

Requirements

This module needs Keras and Tensorflow. (tested with tf>=1.8.0, keras>=2.2.0).

Tensorflow is not included in setup.py and needs to be installed seperately.

What's the use of this module

Working with seq2seq tasks in NLP, I realised there aren't any easy to use, simple to understand and good performing libraries available for this. Though libraries like FairSeq or transformer are available they are in general either too complex for a newbie to understand or most probably overkill (and are very tough to train) for simple projects.

This module provides pre-built seq2seq model with Attention that performs excellently on most of the "simple" NLP taks. (Tested with Punctuation correction, transliteration and spell correction)

To Do

Make number of encoder and decoder layers configurable

Give option to add language model probability in beam search

License

Although txt2txt is licensed under GPL, if you want to use it commercially without open sourcing your code please email me or raise a issue in this repo so that I can provide you explicit written permission to use as you wish. The only reason for doing this is, it would be nice to know if some company is using my work.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

txt2txt-1.1.2.tar.gz (6.7 kB view details)

Uploaded Source

Built Distribution

txt2txt-1.1.2-py2.py3-none-any.whl (18.3 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file txt2txt-1.1.2.tar.gz.

File metadata

  • Download URL: txt2txt-1.1.2.tar.gz
  • Upload date:
  • Size: 6.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.0

File hashes

Hashes for txt2txt-1.1.2.tar.gz
Algorithm Hash digest
SHA256 6fedd15baf0aa37a00bb63084d0e6453cf4efa035dd3babe086da44d191685d0
MD5 228d89881952a7d4849e3cf676379023
BLAKE2b-256 3b3f2e85bd56cf59e881a7796a2a537f29c3af6e9dc343047b3868a8a493132e

See more details on using hashes here.

File details

Details for the file txt2txt-1.1.2-py2.py3-none-any.whl.

File metadata

  • Download URL: txt2txt-1.1.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 18.3 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.0

File hashes

Hashes for txt2txt-1.1.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 ea33275fb30f4efe9257e4fdfb21f73ff3d16e1acfa0bc00559f9629cceb101d
MD5 1e8fee517e90a844f0241646961e2f86
BLAKE2b-256 ae9a16a6f30dc4bcd18bed48dbfb75cac27fe0e61f37e9b67e12e87b3c17ff7a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page