These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Project description

Pattern Based Question and Answer

Description

Pattern Based Question and Answer (PBQA) is a Python library that provides tools for querying LLMs and managing text embeddings. It combines guided generation with multi-shot prompting to improve response quality and ensure consistency. By enforcing valid responses, PBQA makes it easy to combine the flexibility of LLMs with the reliability and control of symbolic approaches.

Installation
Usage
Patterns
Nested Input Keys
Strict Schema (llguidance)
Cache
Roadmap
Relevant Literature
Contributing
Support
License

Installation

PBQA requires Python 3.12 or higher, and can be installed via pip:

pip install PBQA

Additionally, PBQA requires a running instance of llama.cpp to interact with LLMs. For instructions on installation, see the llama.cpp repository.

Usage

llama.cpp

For instructions on hosting a model with llama.cpp, see the following page. Optionally, caching can be enabled to speed up generation.

Python

PBQA provides a simple API for querying LLMs.

from time import strftime
from pydantic import BaseModel
from PBQA import DB, LLM

# First, we define a schema for the weather query
class Weather(BaseModel):
    latitude: float
    longitude: float
    time: str

# Then, we set up a database at a specified path (or the host and port of a remote server)
db = DB(path="db")
# And define a pattern to use for generating responses
db.load_pattern(
    schema=Weather,
    examples="weather.yaml",
    system_prompt="Your job is to translate the user's input into a weather query object.",
    input_key="query",
)

# Next, we connect to the LLM server
llm = LLM(db=db, host="localhost")
# And connect to the model
llm.connect_model(
    model="llama",
    port=8080,
    stop=["<|eot_id|>", "<|start_header_id|>"],
    temperature=0,
)

# Finally, we query the LLM and receive a response based on the specified pattern
# Optionally, external data can be provided to the LLM which it can use in its response
weather_query = llm.ask(
    input={
        "query": "Could I see the stars tonight?",
        "now": "2024-09-30 10:36",
    },
    pattern="weather",
    model="llama",
)["response"]

Using the weather.yaml pattern file and llama 3 running on localhost:8080, the response should look something like this:

{
    "latitude": 51.51,
    "longitude": 0.13,
    "time": "2024-09-30 23:00",
}

For more information, see the examples directory.

Patterns

Patterns are used to guide the LLM in generating responses. Each pattern needs at least a schema to define the expected output, and optionally a system prompt and example data. The system prompt is the main instruction given to the LLM telling it what to do. The example data is used to further guide the LLM in generating responses.

The example case above uses the Weather schema defined earlier in the code, a simple system prompt describing the task, and some sample data for the weather query.

While the example above uses an unmodified string to represent the time, it's also possible to use regex to restrict it further:

class Weather(BaseModel):
    latitude: float
    longitude: float
    time: Annotated[
        str, Field(pattern=r"^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$")
    ]

Using the Field annotation and specifying a regex pattern, the LLM will only be able to generate responses that match the pattern. In this cae the LLM will only be able to generate responses that are in the format of a date and time in the format YYYY-MM-DD HH:MM.

Beyond the Pydantic schema, the user can also provide a system prompt and example data to help the LLM generate responses. Here is an excerpt from the weather.yaml file:

- user:
    query: What will the weather be like tonight
    now: 2019-09-30 10:36
  assistant:
    latitude: 51.51
    longitude: -0.13
    time: 2019-09-30 20:00
- user:
    query: any idea if it'll be sunny tomorrow in Paris?
    now: 2016-11-02 12:15
  assistant:
    latitude: 48.86
    longitude: 2.35
    time: 2016-11-03 13:00
- user:
    query: will it be dry out by the time I get off work?
    now: 2025-06-12 09:23
  assistant:
    latitude: 51.51
    longitude: -0.13
    time: 2025-06-12 17:00
...

Note. While the assistant's response is validated against the schema when loaded, the user's input is free to be anything. This allows the user to provide any information they want (see tool use).

Any samples passed to the LLM through the examples parameter will be used as "base examples" for the pattern, which are examples that are loaded as part of every query to the LLM. Since caching is enabled by default, if your llama.cpp server is initialized correctly, the increased prompt processing time for these examples only occurs once per pattern (see cache). In addition to these base examples, more examples can also be added later for the LLM to learn from.

Nested Input Keys

PBQA supports nested access to complex data structures through the input_key parameter. This enables working with hierarchical data like conversation histories, nested API responses, and complex application states.

Supported Syntax

Simple keys: "query" - Direct dictionary access (backward compatible)
Dot notation: "user.query" - Navigate nested dictionaries
Array indexing: "history[0]" - Access array elements by index
Negative indexing: "history[-1]" - Access array elements from the end
Combined paths: "user.history[0].input" - Mix dots and array access

Examples

# Simple key access (existing behavior)
db.load_pattern(
    schema=Response,
    input_key="query"
)

# Nested object access
db.load_pattern(
    schema=Response,
    input_key="user.query"
)

# Array indexing
db.load_pattern(
    schema=Response,
    input_key="messages[0]"
)

# Complex nested access
db.load_pattern(
    schema=Response,
    input_key="conversation.history[-1].content"
)

This feature enables PBQA to work seamlessly with complex conversation architectures and structured data formats while maintaining full backward compatibility with existing patterns.

Strict Schema (llguidance)

When using a llama.cpp server built with llguidance support (-DLLAMA_LLGUIDANCE=ON), enable strict_schema on connect_model() to get faster and more reliable structured generation:

llm.connect_model(
    model="llama",
    port=8080,
    strict_schema=True,  # required for llguidance servers
)

This sets additionalProperties: false on all object types in the JSON schema before sending it to the server. llguidance follows the JSON Schema spec where additionalProperties defaults to true, which without this flag allows the model to output arbitrary extra keys and effectively disables structural enforcement on nested objects.

Benchmarks show a 2-3x speedup in structured generation throughput compared to the default GBNF grammar engine, with improved reliability on complex nested schemas.

Cache

Unless overridden, queries using the same pattern will use the same system prompt and base examples, allowing a large part of the response to be cached. This avoids the need reprocess those parts of the response, speeding up the query. This can be disabled by setting use_cache=False when invoking llm.ask().

PBQA allocates a slot/process for each pattern-model pair in the llama.cpp server. Set -np to the number of unique combinations of patterns and models you want to enable caching for. Slots are allocated in the order they are requested, and if the number of available slots is exceeded, the last slot is reused for any excess pattern-model pairs.

You can manually assign a cache slot to a specific pattern-model pair using the link method. Optionally, a specific cache slot can be provided, up to the number of available processes. The cache slot used for a query can also be overridden by passing the cache_slot parameter to the llm.ask() method.

from PBQA import DB, LLM


db = DB(path="db")
db.load_pattern(
    schema=Weather,
    examples="weather.yaml",
    system_prompt="Your job is to translate the user's input into a weather query object.",
    input_key="query",
)

llm = LLM(db=db, host="localhost")
llm.connect_model(
    model="llama",
    port=8080,
    stop=["<|eot_id|>", "<|start_header_id|>"],
    temperature=0,
)
llm.link(pattern="weather", model="llama")

Once a pattern-model pair is linked, the "model" parameter in the ask() method may also be omitted. The query will instead use the model assigned during the last appropriate link call.

Roadmap

Future features in no particular order with no particular timeline:

Reranking
Parallel query execution
Combining multi-shot prompting with message history
Multimodal support
~~Further speed improvements (possibly batching)~~ llguidance support
Support for more LLM backends

Relevant Literature

Contributing

Contributions are welcome! If you have any suggestions or would like to contribute, please open an issue or a pull request.

Support

If you want to support the development of PBQA, consider buying me a coffee. Any support is greatly appreciated!

License and Acknowledgements

This project is licensed under the terms of the MIT License. For more details, see the LICENSE file.

Qdrant is a vector database that provides an API for managing and querying text embeddings. PBQA uses Qdrant to store and retrieve text embeddings.

llama.cpp is a C++ library that provides an easy-to-use interface for running LLMs on a wide variety of hardware. It includes support for Apple silicon, x86 architectures, and NVIDIA GPUs, as well as custom CUDA kernels for running LLMs on AMD GPUs via HIP. PBQA uses llama.cpp to interact with LLMs.

Pydantic is a Python library that provides a powerful and flexible way to define data models.

PBQA was originally developed by Bart Haagsma as part of different project. If you have any questions or suggestions, please feel free to contact me at dev.baagsma@gmail.com.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Release history Release notifications | RSS feed

1.3.2

Apr 4, 2026

1.3.1

Feb 27, 2026

This version

1.3.0

Feb 10, 2026

1.2.9

Dec 27, 2025

1.2.8

Dec 27, 2025

1.2.7

Oct 24, 2025

1.2.6

Oct 20, 2025

1.2.5

Oct 8, 2025

1.2.4

Jun 6, 2025

1.2.3

Jun 6, 2025

1.2.2

Feb 11, 2025

1.2.1

Feb 2, 2025

1.2.0

Dec 15, 2024

1.1.0

Dec 12, 2024

1.0.0

Dec 10, 2024

0.2.7

Dec 1, 2024

0.2.4

Sep 27, 2024

0.2.3

Aug 3, 2024

0.2.2

Aug 2, 2024

0.2.1

Aug 2, 2024

0.2.0

Aug 2, 2024

0.1.12

Jul 30, 2024

0.1.11

Jul 4, 2024

0.1.10

Jul 3, 2024

0.1.9

Jun 25, 2024

0.1.8

Jun 24, 2024

0.1.7

Jun 24, 2024

0.1.6

Jun 23, 2024

0.1.5

Jun 23, 2024

0.1.4

Jun 23, 2024

0.1.3

Jun 19, 2024

0.1.2

Jun 18, 2024

0.1.1

Jun 18, 2024

0.1.0

Jun 18, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pbqa-1.3.0.tar.gz (22.5 kB view details)

Uploaded Feb 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pbqa-1.3.0-py3-none-any.whl (22.5 kB view details)

Uploaded Feb 10, 2026 Python 3

File details

Details for the file pbqa-1.3.0.tar.gz.

File metadata

Download URL: pbqa-1.3.0.tar.gz
Upload date: Feb 10, 2026
Size: 22.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for pbqa-1.3.0.tar.gz
Algorithm	Hash digest
SHA256	`e9df3dd75e1fc3642661a7502090b18de8a01bdc5cdd777548a5d89368a827c5`
MD5	`bae95caa268375865546dc2cd52fff5b`
BLAKE2b-256	`7f49bf092c324db87e26d53fd67b83c8efdd54dd0443aec0365ffcc8ed2d941a`

See more details on using hashes here.

File details

Details for the file pbqa-1.3.0-py3-none-any.whl.

File metadata

Download URL: pbqa-1.3.0-py3-none-any.whl
Upload date: Feb 10, 2026
Size: 22.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for pbqa-1.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d5cd529859d0d670880ea47685bb7b1b8a91365db1dc6aba305af5ac3c9d01d6`
MD5	`e8f7613896443c33473b4e54bd825ee9`
BLAKE2b-256	`3199f394fdc2a6a357178cfeba555cf388b1d7584b835478c38cc9a47ff5eaec`

See more details on using hashes here.

PBQA 1.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Pattern Based Question and Answer

Description

Installation

Usage

llama.cpp

Python

Patterns

Nested Input Keys

Supported Syntax

Examples

Strict Schema (llguidance)

Cache

Roadmap

Relevant Literature

Contributing

Support

License and Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes