Skip to main content

A description of your package

Project description

Janex: Ultimate Edition

I've realised that releasing multiple editions of Janex made things rather complicated for people to use the libraries, having to use a different library for a different method, also it's rather pointless me trying to maintain three or four individual libraries, so I've rewritten them to work inter-changeably and compiled them into this one singular library!

Please note this requires PyTorch, Spacy and NLTK to be installed, which will automatically install when using pip.

python3 -m pip install JanexUltimate

How to use

There are four flavours of Janex which are now bound into one library.

  • Janex Python
  • Janex PyTorch
  • Janex Spacy
  • Janex NLG

Janex Python

Tokenization

The tokenize(input_string) function tokenizes the input string into individual words, removing punctuation and converting all characters to lowercase.

Example usage:

from janex.janexpython import *

input_string = "Hello, this is a sample sentence."
words = tokenize(input_string)
print(words)  # Output: ['hello', 'this', 'is', 'a', 'sample', 'sentence']

Word Stemming

The stem(input_word) function reduces a word to its base form by removing common suffixes.

Example usage:

input_word = "running"
stemmed_word = stem(input_word)
print(stemmed_word)  # Output: "run"

String Vectorization

The string_vectorize(input_string) function converts a string into a numpy array of ASCII values representing each character.

Example usage:

input_string = "hello"
vector = string_vectorize(input_string)
print(vector)  # Output: array([104, 101, 108, 108, 111])

Reshape Array Dimensions

The reshape_array_dimensions(array, dimensions) function reshapes the dimensions of a numpy array.

Example usage:

import numpy as np

array = np.array([1, 2, 3, 4, 5, 6])
new_dimensions = (2, 3)
reshaped_array = reshape_array_dimensions(array, new_dimensions)
print(reshaped_array)  # Output: array([[1, 2, 3], [4, 5, 6]])

Cosine Similarity Calculation

The calculate_cosine_similarity(vector1, vector2) function calculates the cosine similarity between two numpy arrays.

Example usage:

import numpy as np

vector1 = np.array([1, 2, 3])
vector2 = np.array([4, 5, 6])
similarity = calculate_cosine_similarity(vector1, vector2)
print(similarity)  # Output: 0.9746318461970762

Intent Classifier Toolkit

The IntentClassifier class provides functionality for intent classification based on pre-trained vectors and intents.

Example usage:

classifier = IntentClassifier()
classifier.set_vectorsfp("vectors.json")
classifier.set_intentsfp("intents.json")
classifier.set_dimensions((300, 300))

classifier.train_vectors()

input_string = "How can I reset my password?"
intent = classifier.classify(input_string)
print(intent)  # Output: {'tag': 'password_reset', 'patterns': ['How can I reset my password?'], 'responses': ['You can reset your password by...']}

Janex PyTorch

The Janex PyTorch library provides tools for intent classification and response generation using PyTorch.

from JanexUltimate.janexpytorch import *

Initializing JanexPT

To initialize the JanexPT class, provide the file path to the intents JSON file.

janex_pt = JanexPT(intents_file_path)

Setting device

You can set the device for PyTorch operations (e.g., "cpu" or "cuda") using the set_device method.

janex_pt.set_device("cpu")

Comparing Patterns

To compare patterns and classify intents, use the pattern_compare method.

intent = janex_pt.pattern_compare(input_string)
print(intent)

Modifying data path

janex_pt.modify_data_path(new_path)

Training program

The library includes a training program that can be executed to train the model. Simply call the trainpt method.

janex_pt.trainpt()

Example

Here's an example of how to use the Janex PyTorch library:

from JanexUltimate import *

intents_file_path = "intents.json"
janex_pt = JanexPT(intents_file_path)

input_string = "How can I reset my password?"
intent = janex_pt.pattern_compare(input_string)
print(intent)

Janex Spacy

The JanexSpacy library provides tools for intent classification and response generation using spaCy.

Importing the Library

from JanexUltimate.janexspacy import *

Create an instance

Before anything else, you need to create an instance of the IntentMatcher class. (If you do not have one made already, the program will automatically download a pre-written file created by @SoapDoesCode - big thanks to her for their intents file!)

intents_file_path = "./intents.json"

thesaurus_file_path = "./thesaurus.json"

vectors_file_path = "./vectors.json"

matcher = JanexSpacy(intents_file_path, thesaurus_file_path, vectors_file_path)

Optional: If you would like to update your thesaurus to your most recent pre-written file, then you can add this code to check for new versions and to download them. Be careful though, this function removes your thesaurus file, which means any unsaved data which doesn't exist on the pre-written file will be erased. (But could possibly be restored in your bin directory)

matcher.update_thesaurus()

Tokenizing:

To utilise the tokenizer feature, here is an example of how it can be used.

input_string = "Hello! What is your name?"

words = matcher.Tokenize(input_string)

print(words)

Intent classifying:

To compare the input with the patterns from your intents.json storage file, you have to declare the intents file path.

intent_class = matcher.pattern_compare(input_string)

print(intent_class)

Response similarity:

Sometimes a list of responses in a class can become varied in terms of context, and so in order to get the best possible response, we can use the 'responsecompare' function to compare the input string with your list of responses.

BestResponse = matcher.response_compare(input_string, intent_class)

print(BestResponse)

Text Generation:

In experimental phase but included in Janex: 0.0.15 and above, and ported through JanexSC, the 'ResponseGenerator' function can absorb the response chosen by your response comparer from your intents.json file, and then modify it, replacing words with synonyms, to give it a more unscripted response.

For this to be used, if you haven't got a thesaurus.json file already, the IntentMatcher will automatically download the pre-written example directly from Github and into your chatbot folder.

After doing so, you may include the feature in your code like this.

generated_response = matcher.ResponseGenerator(BestResponse)

print(generated_response)

Warning: This feature is still work-in-progress, and will only be as effective per the size of your thesaurus file, so don't expect it to be fully stable until I have fully completed it. :)

Janex NLG

Training the model

First, I would recommend creating a file named 'train.py' which you would use to create the binary file.

In this file, you would write:

from JanexNLG.trainer import *

NLG = NLGTraining() # Create an instance of the JanexNLG training module.
NLG.set_directory("./files") # Set this to the name of a folder in the same directory as your train.py file. This folder will contain all of your txt files you wish to train the model with.
NLG.set_spacy_model("en_core_web_md") # You can set this to any Spacy model of your choosing. I would recommend en_core_web_sm for weak or older hardware.
NLG.train_data() # Finally, train the data. This will save everything collected into a .bin file in your program's directory.

Optional GPU support:

NLG.set_device("cuda")

Finetuning the model

For versions > 0.0.2, a finetuning feature is available. After training your model, if you wish to add extra modifications to alter the model for a specific purpose, you can set the directory to a new folder, put these new data pieces in there, and then continue to finetune the model.

from JanexUltimate.janexnlg import *

NLG = NLGTraining()
NLG.set_directory("./files_for_finetuning")
NLG.set_spacy_model("en_core_web_md")
NLG.finetune_model("janex.bin") # You've got to add your model name to this function so the library knows what it is finetuning.

Using the model

Once you've created the binary data, effectively teaching the AI the connections between words and sentence structures, you can then use it to generate text.

from JanexUltimate.janexnlg import *

Generator = NLG("en_core_web_md", "janex.bin") # Your chosen spacy model and the name of the .bin file generated by the training program.
input_sentence = input("You: ")
ResponseOutput = Generator.generate_sentence(input_sentence)
print(ResponseOutput)

Warning:

The larger the txt file, the larger the .bin file will be. Make sure you are using the appropriate hardware. The more diverse data there is in the txt files, the higher the accuracy and more coherent the responses will be. I hope this comes in useful! :)

Thank you for using JanexNLG <3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

JanexUltimate-0.1.7.tar.gz (10.9 kB view details)

Uploaded Source

File details

Details for the file JanexUltimate-0.1.7.tar.gz.

File metadata

  • Download URL: JanexUltimate-0.1.7.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.31.0 rfc3986/1.5.0 tqdm/4.66.1 urllib3/1.26.5 CPython/3.10.12

File hashes

Hashes for JanexUltimate-0.1.7.tar.gz
Algorithm Hash digest
SHA256 98f9dfe3932af2513d7f59a6c67472cc0f1a122a273d5afe6fb6f210c7ced228
MD5 f72416bc276c958322f3449be9cd62b1
BLAKE2b-256 ce45ab62a3275b05f8fcea3f8ce35c201059253271514352779c4aca906f85fd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page