Basic python package for creating n-gram language models from text files
Project description
Maximum Likelihood fit for N-grams
A small library for quickly deriving the Maximum Likelihood estimates and Neural Network training for N-grams.
Installation
pip install ngram-ml
Usage
from ngram_ml import *
Example
- Maximum Likelihood Estimator Example
mle = NGramMLEstimator(sentences=tokens, n_grams=2, label_smoothing=1)
mle.calculate_cross_entropy(tokens)
mle.calculate_cross_entropy([['<S>', 'the', 'cat', 'sat', 'on', 'the', 'mat', '</S>']])
mle.generate_sentence(30, initial_pre_seq= tuple([mle.word_to_idx['pencil']]))
mle.generate_most_probable_sentence(30, initial_pre_seq= tuple([mle.word_to_idx['book']]))
- Neural Network Example
# Neural Network Example
dataset = NGramDataset(sentences=tokens, n_grams=2)
NN = NGramNeuralNet(n_grams=2, in_size=dataset.n_unique_words, embed_size=200)
NN.train(dataset.x, dataset.y, n_epochs=100, lr=0.01)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ngram_ml-0.1.0.tar.gz
(9.5 kB
view details)
Built Distribution
File details
Details for the file ngram_ml-0.1.0.tar.gz
.
File metadata
- Download URL: ngram_ml-0.1.0.tar.gz
- Upload date:
- Size: 9.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3a090be042570016fea27ccb9b3a6a24db1757ada26762592076df2438ee8621 |
|
MD5 | 8a719fde046623dce59542def441b10f |
|
BLAKE2b-256 | a93df0af05129c8b41ff6d3692a621327765e36479304bf17d15334c18bb0e3f |
File details
Details for the file ngram_ml-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: ngram_ml-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ece709a1220e898542347cefc56a89f1f232954d995cf6ee20ce1e0510f60ba |
|
MD5 | 76981d96c158cb6cd42d8968c1b531ba |
|
BLAKE2b-256 | abe4b0f207997dbe1c81508c87da625d807de6373bf4bec28c1f17fb5ee401cb |