A Python chatbot that learns as you speak to it.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Thomas the Chatbot

Demo

Installation

Python 3.9+ is required

This package can be installed from PyPi with:

pip install thomasthechatbot

Usage

Basic Usage

from ttc import Chatbot, Context
from ttc.utils import download_nltk_data

# Only needs to be run once (can be removed after first run)
download_nltk_data()

ctx = Context()

chatbot = Chatbot()

talk = True

while talk:
    msg = input("You: ")

    if msg == "s":
        talk = False
    else:
        # Getting the response
        resp = chatbot.respond(ctx, msg)

        # Saving the response to the context
        ctx.save_resp(resp)

        print(f"Thomas: {resp}")

# Saving the chatbot data
chatbot.save_data()

Configurations

chatbot = Chatbot(
    path="brain",
    learn=False,
    min_score=0.5,
    score_threshold=0.5,
    mesh_association=0.5,
)

CLI

Type ttc to begin talking to Thomas.

How does Thomas work?

Thomas has no hard-coded responses and is designed to “learn” as he is spoken to.

Note: I created this approach based on my intuition and not a proven method.

Data Storage

Thomas does not come up with his own responses, he reiterates those that he has seen before.

Responses

Previous responses are stored in resps.json as a dictionary where the key is a generated UUID and the value is the tokenized response.

Mesh

Prompts are associated with responses through a "mesh" which is stored in mesh.thomas. The mesh consists of a dictionary where the key is the UUID of the prompt and the value is a "link". Links associate responses to patterns of words, they have the following attributes:

stop_words: set Stop words separated from the tokenized prompt.

keywords: set The remaining words which are lemmatized by their part of speech.

resps: dict[str, set] Responses to the prompt where the key is the response UUID and the value is a set of mesh ids from the previous prompt.

Querying Responses

Tokenizing Prompts

Before tokenization, prompts are lowercased, contractions are expanded and punctuation is removed. This aids in improving the consistency and accuracy of queries. Prompts are tokenized by word and split into key words and stop words.

Ignoring Responses

The user's prompt and chatbot's previous response are ignored to prevent the chatbot from appearing repetitive.

Initial Query

Meshes are initially queried by their score which can be calculated with:

(ss / 2 + sk) / (ts / 2 + tk - ss / 2 - sk + 1)

ss = shared stop words

sk = shared key words

ts = total stop words

tk = total key words

This formula weighs shared key words 2 times more heavily than stop words by dividing ss and sk by 2. It also takes into account the total number of words resulting in more precise meshes being favoured.

First Discard

Meshes with scores below a threshold (min_score) are discarded.

No Results Queried

If no results remain, meshes are queried by the number of shared stop words.

Second Discard

The remaining meshes are sorted and meshes that fall below a percentage threshold (score_threshold) of the best score are discarded. Considering multiple meshes increases the variety of responses.

Mesh Association

Meshes are associated with each other by the percentage of shared responses (mesh_association). Associated meshes for each queried mesh are found and added to the list. This process prevents less trained prompts from having a small response pool.

Choosing a Response

If responses are found to share the same previous message UUID as the prompt, all non-sharing responses are moved. Responses are chosen at random from the remaining responses. Random selection prevents the chatbot from being predictable.

Contributing

Open to contributions, please create an issue if you want to do so.

Formatting

Black, isort and Prettier are used for formatting

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.0.5

Dec 15, 2022

1.0.4

Dec 15, 2022

1.0.3

Oct 28, 2022

1.0.2

Oct 12, 2022

1.0.1

Sep 8, 2022

This version

1.0.0

Sep 8, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thomasthechatbot-1.0.0.tar.gz (10.5 kB view hashes)

Uploaded Sep 8, 2022 Source

Built Distribution

thomasthechatbot-1.0.0-py3-none-any.whl (11.6 kB view hashes)

Uploaded Sep 8, 2022 Python 3

Hashes for thomasthechatbot-1.0.0.tar.gz

Hashes for thomasthechatbot-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`c8bbda6efb1621a89f3c7d8ad23f6ef51b8962f3258e8c0246ed9094f7c4784b`
MD5	`feb3266bf5b2aee9d3c48eb0f8a23ada`
BLAKE2b-256	`fac8e2e6e00e54c34be4217d9d9ec8aa34fef88198d7e4d83e3724defb666a0c`

Hashes for thomasthechatbot-1.0.0-py3-none-any.whl

Hashes for thomasthechatbot-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`acba993bc74602a1564d4388a24584c74156b01bdb9f39c4288a2e05459f7a13`
MD5	`4321989ae1acf1a4d84d1913e6f1f92d`
BLAKE2b-256	`eaa53cc0755c50a8da51974decc7623c4350798bc2714cfb0dbed449bca230a9`

thomasthechatbot 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Thomas the Chatbot

Installation

Usage

Basic Usage

Configurations

CLI

How does Thomas work?

Data Storage

Responses

Mesh

Querying Responses

Tokenizing Prompts

Ignoring Responses

Initial Query

First Discard

No Results Queried

Second Discard

Mesh Association

Choosing a Response

Contributing

Formatting

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution