Skip to main content

A generative, conversational workflow and multi-agent system using PDL and mlx

Project description

Generative Redfoot

A generative, conversational workflow and multi-agent system using PDL and mlx

Takes a minimal Prompt Declaration Language (PDL) file and generates a finite state generative machine as Python objects for a subset of the PDL language. These objects (the "programs" in particular) can be executed.

It was mainly motivated by supporting this use case from the PDL documentation/paper:

Animated GIF of PDL chatbot.

The Model class can be extended and incorporated into how a dispatcher creates the PDL Python objects from a PDL file to incorporate the functionality for evaluating the prompts against the models specified in PDL. This is done using any accumulated conversational context, prompts, and generation parameters (sampling parameters, for example), (optionally) updating the context as the program execution continues and how mlx is used to implement the model loading and inference.

However, the language of the PDL file can be extended with additional custom functionality, and other LLM systems can handle the evaluation.

Usage

It depends on the PyYaml and click third-party Python libraries as well as mlx and can be run this way, where document.pdl is a PDL file.

% Usage: generative_redfoot [OPTIONS] PDL_FILE

Options:
  -t, --temperature FLOAT
  -rp, --repetition-penalty FLOAT
                                  The penalty factor for repeating tokens
                                  (none if not used)
  --top-k INTEGER                 Sampling top-k
  --top-p INTEGER                 Sampling top-p  
  --max-tokens INTEGER            Max tokens
  --min-p FLOAT                   Sampling min-p
  --verbose / --no-verbose
  -v, --variables <TEXT TEXT>...
  --help                          Show this message and exit.

generative_redfoot.py document.pdl

The main argument is a PDL program (a YAML file), possibly with extensions of the language implemented by generative_redfoot.

You can also specify default values for sampling parameters for the LLM calls during the execution of the programs using mlx.

The model parameters directive in PDL can be used to specify the following mlx generation parameters: temperature, top_k, min_p, max_tokens, and top_p:

description: ...
text:
  - read:
    message: |
      What is your query?
    contribute: [context]
- model: .. model ..
  parameters:
    temperature: 0.6
    min_p: .03
    max_tokens: 200

Extensions

Generative Redfoot defines a number of extensions from PDL

PDF Reading

The PDF_read block can be used (within a text block) to read content from a PDF file (uses and requires PyPDF2). You can specify where the text is constibuted to (the 'context' is the most common scenario). Below is an example that reads PDF content from a run-time specified filename and uses it as context for a model evaluation:

description: autocoding_from_pdf
text:
  - text:
    - PDF_read: { $context_file }
      contribute: [context]
    - |
  
      Provide a list of the 5 ICD-10 codes mentioned in the document above
    contribute: [context]
  - model: mlx-community/medgemma-4b-it-4bit

Toolio

Toolio can be used for structured output by specifying a structured_output block like so (from Toolio algebra tutor demo):

description: structured_output
text:
  - structured_output: mlx-community/Llama-3.2-3B-Instruct-4bit
    insert_schema: true
    schema_file: ToolioGit/demo/algebra_tutor.schema.json
    parameters:
      temperature: 0.6
      max_tokens: 512
    input: |
      solve 8x + 31 = 2. Your answer should be only JSON, according to this schema: #!JSON_SCHEMA!#"

Beyond the approach above, its input can be specified in all the ways a PDL model can.

Alpha One Reasoning

AlphaOne (α) reasoning modulation can be used with a supported, reasoning model using Alpha One MLX via providing a alpha_one block within a model:

description: alpha_one_reasoning
text:
    - model: mlx-community/DeepSeek-R1-0528-Qwen3-8B-4bit-AWQ
      parameters:
        temperature: 0.6
        max_tokens: 1200
        repetition_penalty: 1.4
        top_k: 20
        top_p: 0.95
      alpha_one:
        thinking_token_length: 250
        alpha: 1.4
        wait_words: ["Wait"]
      input: |
        What question has an answer of "42?"

This algorithm scales the average thinking phase token length (specified by the thinking_token_length parameter and with a default of 2,650) by the alpha parameter (which defaults to 1.4 per the paper). After exiting this phase, it suppresses attempts to transition to slow thinking by replacing references to 'Wait' words with the "</think>" token. The specific list of these words can be provided via the wait_words parameter or the defaults specified by alpha-one-mlx for each model type will be used.

Prompt management via Wordloom

Prompt snippets can be loaded for use in a PDL file via a word loom library, providing a clean separation of concerns between prompt language management, prompt construction, and LLM workflow management and orchestration. This requires OgbujiPT and below is an example that constructs the system and user prompt from a word loom (TOML) file:

description: wordloom_prompt_example
text:
  - text:
    - role: system
      read_from_wordloom: prompt_library.toml
      items: helpful_chatbot_system_prompt
      contribute: [context]
    - text:
      - read_from_wordloom: prompt_library.toml
        items: hello_prompt
        contribute: [context]
    - model: mlx-community/Llama-3.2-3B-Instruct-4bit

The read_from_wordloom block indicates the use of this extension mechanism and its value is a path to a Word loom.

The items parameter on the block is a space-separated list of language item names in the word loom. Their values are joined together with \n and returned as a value for use (as an example from above) to construct the message for conversations used by downstream PDL directives.

In the example above, the helpful_chatbot_system_prompt language item from the prompt_library.toml word loom is used as the system prompt and the hello_prompt language item from the same file is used as the user prompt:

[
  {"role": "system", "content": ".. helpful_chatbot_system_prompt language item .. "},
  {"role": "user", "content": ".. hello_prompt language item .. "}
]

Caching

A PDL program can use mlx-lm's Prompt caching by specifying a top-level cache parameter at the top of the PDL document. If the value of this parameter is '*', then an internal, rotating K/V cache is used for the duration of the PDL evaluation for efficiency.

Otherwise, the value is expected to be the path to a previously saved cache, created using mlx_lm.cache_prompt for example, which is treated as a cached prompt that is a prefix to any prompts specified in the program.

Chain of Thought (CoT) prefix

If specified within a model block, the cot_prefix parameter takes a path to a file that captures COT few shot content as a file in the LLM chat conversation JSON format. This will be incorporated into the conversation structure to ensure this files content is used as few shot / COT for the model generation

Complex example

Below is an example showing a PDL file constructing message contexts for prompts to chained LLM calls from fragments in a Wordloom library, providing a clean separation of concerns between prompt language management, prompt construction, and LLM workflow management and orchestration. The keys in the YAML file in black use the PDL language. Those in red are generative_redfoot extensions shown in order of appearance: (mlx) prefix caching, COT few-shot loading, reading from a wordloom file, using Google's google/gemma-7b-aps-it model to perform "abstractive proposition segmentation" from LLM output, etc.:

Animated GIF of PDL chatbot.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

generative_redfoot-0.1.1.tar.gz (19.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

generative_redfoot-0.1.1-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file generative_redfoot-0.1.1.tar.gz.

File metadata

  • Download URL: generative_redfoot-0.1.1.tar.gz
  • Upload date:
  • Size: 19.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.6

File hashes

Hashes for generative_redfoot-0.1.1.tar.gz
Algorithm Hash digest
SHA256 557b02c9d745d2c3a581464ca9d8177cf7a0bd78575fb2bc6e34fe96053115a7
MD5 1b1c41fd23e9e1cf52126ac5f81fd218
BLAKE2b-256 63911d714a282057e31448ff834af4a517f6752b0aaefef7fab3a6ab0e80354f

See more details on using hashes here.

File details

Details for the file generative_redfoot-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for generative_redfoot-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 62aad1fc9e90917a79471a7aeb6e865e4ad70a22307c35c392de0821a2286442
MD5 b766ae517dfd0f534f23d806a8ac4874
BLAKE2b-256 f90e2bc407a1cdf855b208084f75a61a4d760a67f9a2033ddf370555785f84c0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page