A configurable local chatbot library with lightweight memory indexing.

These details have not been verified by PyPI

Project description

llama-simple-chat-bot

A small Python library for configurable local chatbots.

llama-simple-chat-bot runs an open-source language model on the local machine and adds a lightweight persistent memory index. It is designed for Windows and Linux on amd64 machines, and it does not require a GPU. The default runtime path uses GGUF models through llama-cpp-python with n_gpu_layers set to 0.

[!NOTE] The library does not call remote OpenAI APIs. Model inference is local, while memory indexing is handled with lightweight files on disk.

Features

JSON bot profiles: name, description, personality, birthday, skills, species, model settings, and memory directory.
Local LLM backends: llama_cpp_python, llama_cpp_cli, and a deterministic echo backend for tests.
Persistent memory: every exchange is logged, split into segments, summarized, indexed, and searched during future conversations.
Associative recall: related past segments can be injected into the prompt as context before the model answers.
CLI and Python API.
No required third-party dependencies for the core package. Local inference is available through the optional local extra.

Recommended Local Models

These presets are intentionally small enough for local GGUF use, with a few better-quality options for slower but more reliable CPU chat:

qwen2.5-0.5b-instruct-q4_k_m: about 491 MB, multilingual, good default for Chinese and English.
qwen2.5-1.5b-instruct-q4_k_m: about 1120 MB, much better than 0.5B for identity stability, memory use, and ordinary chat quality.
qwen2.5-3b-instruct-q4_k_m: about 2100 MB, a stronger choice for roleplay, Chinese chat, and basic reasoning if you can accept slower CPU inference.
smollm2-360m-instruct-q4_k_m: about 271 MB, very small and fast for quick experiments.

The project can also use any local GGUF file supported by llama.cpp.

[!TIP] If you care about actual chat quality, start with qwen2.5-1.5b-instruct-q4_k_m. Use qwen2.5-3b-instruct-q4_k_m when role consistency and answer quality matter more than speed. Keep qwen2.5-0.5b-instruct-q4_k_m for lightweight testing, and smollm2-360m-instruct-q4_k_m only for very small experiments.

Setup

Create a virtual environment before installing optional local inference dependencies.

[!IMPORTANT] Install optional dependencies inside a virtual environment. The project does not require modifying your base Python environment.

The recommended CPU-only path installs the core package first. The first real local-model run then installs the prebuilt CPU llama-cpp-python wheel automatically if it is missing:

python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install -U pip
python -m pip install -e .

On Linux:

python3 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
python -m pip install -e .

[!TIP] The automatic installer runs python -m pip install -r requirements-local-cpu.txt. That requirements file uses https://abetlen.github.io/llama-cpp-python/whl/cpu plus --only-binary llama-cpp-python. If no compatible wheel exists for your Python version and platform, pip fails instead of starting a slow local build.

If the CPU wheel is unavailable on your machine, either install llama.cpp separately and set "backend": "llama_cpp_cli" in the config, or use python -m pip install -e '.[local]' when you intentionally want to build llama-cpp-python from source.

[!NOTE] The local extra is kept for packaging compatibility, but the documented quick path uses requirements-local-cpu.txt because pip dependency metadata cannot store a custom wheel index URL.

[!WARNING] CPU-only inference is usable with small GGUF models, but it is still slower than GPU inference. Keep model.n_gpu_layers at 0 when the machine has no compatible GPU.

Quick Start

Write an example config:

llama-simple-chat-bot init-config examples/my_bot.json

List model presets:

llama-simple-chat-bot models

Download a small GGUF model:

llama-simple-chat-bot download-model qwen2.5-0.5b-instruct-q4_k_m --models-dir models

For a better local chat model:

llama-simple-chat-bot download-model qwen2.5-1.5b-instruct-q4_k_m --models-dir models

Or a stronger 3B preset:

llama-simple-chat-bot download-model qwen2.5-3b-instruct-q4_k_m --models-dir models

Start chatting:

llama-simple-chat-bot chat --config examples/my_bot.json

Shortest Start

[!TIP] Use this section when you only want the shortest path from a fresh checkout to a running bot.

For a real local model run:

python -m venv .venv
source .venv/bin/activate
python -m pip install -e .
llama-simple-chat-bot init-config bot.json
llama-simple-chat-bot download-model qwen2.5-0.5b-instruct-q4_k_m --models-dir models
llama-simple-chat-bot chat --config bot.json

On Windows PowerShell, use:

python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install -e .
llama-simple-chat-bot init-config bot.json
llama-simple-chat-bot download-model qwen2.5-0.5b-instruct-q4_k_m --models-dir models
llama-simple-chat-bot chat --config bot.json

If you want a more usable default on CPU, replace the download step with:

llama-simple-chat-bot download-model qwen2.5-1.5b-instruct-q4_k_m --models-dir models

For a dependency-free smoke test that does not load a model:

python -m pip install -e .
llama-simple-chat-bot ask --config examples/echo_config.json "hello"

[!NOTE] The echo backend is only a deterministic smoke-test backend. It verifies the CLI, config loading, and memory plumbing without loading a language model.

Example Profiles

The examples/ directory includes ready-to-edit bot profiles:

examples/bot_config.json: general local assistant.
examples/nekomimi_config.json: 中文猫娘聊天伙伴.
examples/coding_mentor_config.json: pragmatic coding mentor.
examples/study_partner_config.json: structured study partner.
examples/storyteller_config.json: collaborative fiction and worldbuilding companion.
examples/echo_config.json: dependency-free smoke-test bot.

Run any profile with:

llama-simple-chat-bot chat --config examples/nekomimi_config.json

To debug memory retrieval while chatting, add --verbose:

llama-simple-chat-bot chat --config examples/nekomimi_config.json --verbose

Verbose mode prints diagnostics before the model starts generating: the memory index path and detected encoding, the query terms, whether the turn used recent-overview or keyword retrieval, scored candidate segments, selected prompt hits, and the recalled memory block injected into the model context. The same diagnostics are available for one-shot asks:

llama-simple-chat-bot ask --config examples/my_bot.json --verbose "What do you remember about Python packaging?"

Send one message:

llama-simple-chat-bot ask --config examples/my_bot.json "What do you remember about me?"

Search memory without loading the model:

llama-simple-chat-bot memory-search --config examples/my_bot.json "Python packaging"

JSON Config

See examples/bot_config.json.

Important fields:

name, description, personality, birthday, skills, and species are injected at runtime as authoritative system rules, so the bot knows its configured identity.
memory_dir controls where index.json and segment .jsonl files are stored.
model.backend selects llama_cpp_python, llama_cpp_cli, or echo.
model.model_path points to a local GGUF file.
Preset downloads are available through llama-simple-chat-bot download-model for qwen2.5-0.5b-instruct-q4_k_m, qwen2.5-1.5b-instruct-q4_k_m, qwen2.5-3b-instruct-q4_k_m, and smollm2-360m-instruct-q4_k_m.
model.n_gpu_layers defaults to 0, which keeps inference on CPU.
system_rules is the place to enforce speech style and role constraints, such as asking a catgirl profile to naturally end replies with 喵.
memory.segment_exchange_limit controls when a new memory segment starts.
memory.summary_mode can be extractive or llm. extractive is faster; llm asks the local model to rewrite the segment summary.

Relative paths inside a config file are resolved relative to that config file. JSON config files can be encoded as UTF-8, UTF-8 with BOM, GB2312, or GBK. Memory index.json and segment .jsonl files are read with the same encoding fallbacks and are written back as UTF-8.

[!NOTE] GB2312 and GBK support is intended for Chinese JSON config files produced by older Windows editors or tooling.

Python API

from llama_simple_chat_bot import BotConfig, ChatBot

config = BotConfig.from_file("examples/bot_config.json")
bot = ChatBot(config)

reply = bot.ask("Remember that I prefer SQLite for small apps.")
print(reply)

for hit in bot.search_memory("SQLite"):
    print(hit.summary)

You can also build the config in code:

from llama_simple_chat_bot import BotConfig, ChatBot, MemoryConfig, ModelConfig

config = BotConfig(
    name="Mira",
    description="A practical local assistant with persistent memory.",
    personality="warm, curious, and concise",
    birthday="2026-06-02",
    skills=["Python", "summarization"],
    species="local digital companion",
    memory_dir="./memory/mira",
    model=ModelConfig(
        backend="llama_cpp_python",
        model_path="./models/qwen2.5-0.5b-instruct-q4_k_m.gguf",
        chat_format="chatml",
        n_gpu_layers=0,
    ),
    memory=MemoryConfig(summary_mode="extractive"),
)

bot = ChatBot(config)
print(bot.ask("Hello."))

Memory Layout

The configured memory directory contains:

index.json: all conversation log entries plus segment metadata, summaries, keywords, and log file references.
segments/*.jsonl: append-only per-segment logs.

At response time, the bot searches the index for direct matches and related associative matches, formats the best hits, and injects them into the local model's system context.

For broad questions like "what did we talk about before?", memory recall uses recent segments as an overview. For questions with a concrete topic, such as "did we talk about SQLite?", it uses keyword retrieval so old but relevant segments can beat newer unrelated chats.

[!WARNING] Memory files contain conversation content. Do not commit real user memory directories, downloaded model files, or private chat logs.

Tests

The test suite uses only the built-in unittest module and the echo backend:

python -m unittest

Acknowledgements

This project builds on the local inference ecosystem around llama.cpp, the Python bindings provided by llama-cpp-python, open GGUF model releases from the Qwen and SmolLM communities, and the Python standard library.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jun 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_simple_chat_bot-0.1.0.tar.gz (37.4 kB view details)

Uploaded Jun 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llama_simple_chat_bot-0.1.0-py3-none-any.whl (52.2 kB view details)

Uploaded Jun 6, 2026 Python 3

File details

Details for the file llama_simple_chat_bot-0.1.0.tar.gz.

File metadata

Download URL: llama_simple_chat_bot-0.1.0.tar.gz
Upload date: Jun 6, 2026
Size: 37.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.4.1 CPython/3.11.15 Windows/10

File hashes

Hashes for llama_simple_chat_bot-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`24981d0f5f7196a42514e06002b4650f905b43572e669cad107574f5008d03c4`
MD5	`fafe3a777654b21f4c049e5e1668047d`
BLAKE2b-256	`753d7c3be59dd1082a73e0dc589f094d475d739227f4bfa2eaa0024c28968e5d`

See more details on using hashes here.

File details

Details for the file llama_simple_chat_bot-0.1.0-py3-none-any.whl.

File metadata

Download URL: llama_simple_chat_bot-0.1.0-py3-none-any.whl
Upload date: Jun 6, 2026
Size: 52.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.4.1 CPython/3.11.15 Windows/10

File hashes

Hashes for llama_simple_chat_bot-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0292be5bcd568c38c112e6b1f6631d812a23e0759778e8e3631eadff7d0d08d6`
MD5	`79584753249caab4bedb17939d5ff57e`
BLAKE2b-256	`b3dd098678c4b91d64b6e83f5c20b86d08d165532f0066b734f43bb84a40b693`

See more details on using hashes here.

llama-simple-chat-bot 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

llama-simple-chat-bot

Features

Recommended Local Models

Setup

Quick Start

Shortest Start

Example Profiles

JSON Config

Python API

Memory Layout

Tests

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes