Skip to main content

Basic python package for creating n-gram language models from text files

Project description

Maximum Likelihood fit for N-grams

A small library for quickly deriving the Maximum Likelihood estimates and Neural Network training for N-grams.

Installation

pip install ngram-ml

Usage

from ngram_ml import *

Example

  • Maximum Likelihood Estimator Example
mle = NGramMLEstimator(sentences=tokens, n_grams=2, label_smoothing=1)
mle.calculate_cross_entropy(tokens)
mle.calculate_cross_entropy([['<S>', 'the', 'cat', 'sat', 'on', 'the', 'mat', '</S>']])

mle.generate_sentence(30, initial_pre_seq= tuple([mle.word_to_idx['pencil']]))
mle.generate_most_probable_sentence(30, initial_pre_seq= tuple([mle.word_to_idx['book']]))
  • Neural Network Example
# Neural Network Example
dataset = NGramDataset(sentences=tokens, n_grams=2)
NN = NGramNeuralNet(n_grams=2, in_size=dataset.n_unique_words, embed_size=200)
NN.train(dataset.x, dataset.y, n_epochs=100, lr=0.01)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ngram_ml-0.1.0.tar.gz (9.5 kB view details)

Uploaded Source

Built Distribution

ngram_ml-0.1.0-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file ngram_ml-0.1.0.tar.gz.

File metadata

  • Download URL: ngram_ml-0.1.0.tar.gz
  • Upload date:
  • Size: 9.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.12

File hashes

Hashes for ngram_ml-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3a090be042570016fea27ccb9b3a6a24db1757ada26762592076df2438ee8621
MD5 8a719fde046623dce59542def441b10f
BLAKE2b-256 a93df0af05129c8b41ff6d3692a621327765e36479304bf17d15334c18bb0e3f

See more details on using hashes here.

File details

Details for the file ngram_ml-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ngram_ml-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.12

File hashes

Hashes for ngram_ml-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4ece709a1220e898542347cefc56a89f1f232954d995cf6ee20ce1e0510f60ba
MD5 76981d96c158cb6cd42d8968c1b531ba
BLAKE2b-256 abe4b0f207997dbe1c81508c87da625d807de6373bf4bec28c1f17fb5ee401cb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page