A Python utility for interacting with large language models (LLMs) via web automation

These details have not been verified by PyPI

Project links

Homepage

Project description

talktollm

A Python utility for interacting with large language models (LLMs) through a command-line interface. It leverages image recognition to automate interactions with LLM web interfaces, enabling seamless conversations and task execution.

Features

Command-Line Interaction: Provides a simple and intuitive command-line interface for interacting with LLMs.
Automated Image Recognition: Employs image recognition techniques (via optimisewait) to identify and interact with elements on the LLM interface. Includes fallback if optimisewait is not installed.
Multi-LLM Support: Currently supports DeepSeek and Gemini.
Automated Conversations: Facilitates automated conversations and task execution by simulating user interactions.
Image Support: Allows sending images (base64 encoded) to the LLM.
Robust Clipboard Handling: Includes configurable retry mechanisms (default 5 retries) for setting text/images to the clipboard and reading text from the clipboard to handle access errors and timing issues.
Dynamic Image Path Management: Copies necessary recognition images to a temporary directory, ensuring they are accessible and up-to-date.
Easy to use: Designed for simple setup and usage.

Core Functionality

The core function is talkto(llm, prompt, imagedata=None, debug=False, tabswitch=True, read_retries=5, read_delay=0.3).

Arguments:

llm (str): The LLM name ('deepseek','gemini' or 'aistudio').
prompt (str): The text prompt.
imagedata (list[str] | None): Optional list of base64 encoded image strings (e.g., "data:image/png;base64,...").
debug (bool): Enable detailed console output. Defaults to False.
tabswitch (bool): Switch focus back to the previous window after closing the LLM tab. Defaults to True.
read_retries (int): Number of attempts to read the final response from the clipboard. Defaults to 5.
read_delay (float): Delay in seconds between clipboard read attempts. Defaults to 0.3.

Steps:

Validates the LLM name.
Sets up image paths for optimisewait using set_image_path.
Opens the LLM's website in a new browser tab.
Waits and clicks the message input area using optimiseWait('message', clicks=2).
If imagedata is provided:
- Iterates through images.
- Sets each image to the clipboard using set_clipboard_image (with retries).
- Pastes the image (Ctrl+V).
- Waits for potential upload (sleep(7)).
Sets the prompt text to the clipboard using set_clipboard (with retries).
Pastes the prompt (Ctrl+V).
Waits and clicks the 'run' button using optimiseWait('run').
Waits for the response generation, using optimiseWait('copy') as an indicator that the response is ready and the copy button is visible.
Waits briefly (sleep(0.5)) after optimiseWait('copy') clicks the copy button.
Closes the browser tab (Ctrl+W).
Switches focus back if tabswitch is True (Alt+Tab).
Attempts to read the LLM's response from the clipboard with retry logic (read_retries, read_delay).
Returns the retrieved text response, or an empty string if reading fails.

Helper Functions

Clipboard Handling:

set_clipboard(text: str, retries: int = 5, delay: float = 0.2): Sets text to the clipboard, handling CF_UNICODETEXT. Retries on common access errors (winerror 5 or 1418).
set_clipboard_image(image_data: str, retries: int = 5, delay: float = 0.2): Sets a base64 encoded image to the clipboard (CF_DIB format). Decodes, converts to BMP, and retries on common access errors.

Image Path Management:

set_image_path(llm: str, debug: bool = False): Orchestrates copying images.
copy_images_to_temp(llm: str, debug: bool = False): Copies necessary .png images for the specified llm from the package's images/<llm> directory to a temporary location (%TEMP%\\talktollm_images\\<llm>). Creates the temporary directory if needed and only copies if the source file is newer or the destination doesn't exist. Sets the optimisewait autopath. Includes error handling for missing package resources.

Installation

pip install talktollm

Note: Requires optimisewait for image recognition. Install separately if needed (pip install optimisewait).

Usage

Here are some examples of how to use talktollm.

Example 1: Simple Text Prompt

Send a basic text prompt to Gemini.

import talktollm

prompt_text = "Explain quantum entanglement in simple terms."
response = talktollm.talkto('gemini', prompt_text)
print("--- Simple Gemini Response ---")
print(response)

Example 2: Text Prompt with Debugging

Send a text prompt and enable debugging output to see more details about the process.

import talktollm

prompt_text = "What are the main features of Python 3.12?"
response = talktollm.talkto('deepseek', prompt_text, debug=True)
print("--- DeepSeek Debug Response ---")
print(response)

Example 3: Preparing Image Data

Load an image file, encode it in base64, and format it correctly for the imagedata argument.

import base64
import io
from PIL import Image

# Load your image (replace 'path/to/your/image.png' with the actual path)
try:
    with open("path/to/your/image.png", "rb") as image_file:
        # Encode to base64
        encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
        # Format as a data URI
        image_data_uri = f"data:image/png;base64,{encoded_string}"
        print("Image prepared successfully!")
        # You can now pass [image_data_uri] to the imagedata parameter
except FileNotFoundError:
    print("Error: Image file not found. Please check the path.")
    image_data_uri = None
except Exception as e:
    print(f"Error processing image: {e}")
    image_data_uri = None

# This 'image_data_uri' variable holds the string needed for the next example

Example 4: Text and Image Prompt

Send a text prompt along with a prepared image to Gemini. (Assumes image_data_uri was successfully created in Example 3).

import talktollm

# Assuming image_data_uri is available from the previous example
if image_data_uri:
    prompt_text = "Describe the main subject of this image."
    response = talktollm.talkto(
        'gemini',
        prompt_text,
        imagedata=[image_data_uri], # Pass the image data as a list
        debug=True
    )
    print("--- Gemini Image Response ---")
    print(response)
else:
    print("Skipping image example because image data is not available.")

Dependencies

pywin32: For Windows API access (clipboard).
pyautogui: For GUI automation (keystrokes, potentially mouse if optimisewait fails).
Pillow: For image processing (opening, converting for clipboard).
optimisewait (Optional but Recommended): For robust image-based waiting and clicking.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.8.6

May 27, 2026

0.8.5

Apr 16, 2026

0.8.4

Apr 12, 2026

0.8.3

Mar 29, 2026

0.8.2

Feb 19, 2026

0.8.1

Feb 2, 2026

0.8.0

Jan 7, 2026

0.7.1

Jan 2, 2026

0.7.0

Dec 25, 2025

0.6.10

Dec 18, 2025

0.6.9

Dec 14, 2025

0.6.8

Dec 11, 2025

0.6.7

Dec 10, 2025

0.6.6

Nov 20, 2025

0.6.5

Oct 16, 2025

0.6.4

Oct 16, 2025

0.6.3

Sep 27, 2025

0.6.2

Sep 27, 2025

0.6.1

Sep 27, 2025

0.5.5

Sep 26, 2025

0.5.4

Sep 5, 2025

0.5.3

Aug 31, 2025

0.5.2

Aug 26, 2025

0.5.1

Aug 22, 2025

0.5.0

Aug 20, 2025

0.4.9

Jul 30, 2025

0.4.8

Jul 24, 2025

0.4.7

Jul 24, 2025

0.4.6

Jul 24, 2025

0.4.2

Jul 14, 2025

0.4.1

Jul 11, 2025

This version

0.4.0

Jul 10, 2025

0.3.6

Jul 6, 2025

0.3.5

Jun 21, 2025

0.3.4

Apr 17, 2025

0.3.3

Apr 17, 2025

0.3.1

Apr 5, 2025

0.3.0

Mar 28, 2025

0.2.7

Mar 15, 2025

0.2.6

Mar 3, 2025

0.2.5

Feb 23, 2025

0.2.4

Feb 23, 2025

0.2.3

Feb 21, 2025

0.2.2

Feb 9, 2025

0.2.0

Feb 9, 2025

0.1.0

Feb 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

talktollm-0.4.0.tar.gz (22.7 kB view details)

Uploaded Jul 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

talktollm-0.4.0-py3-none-any.whl (20.1 kB view details)

Uploaded Jul 10, 2025 Python 3

File details

Details for the file talktollm-0.4.0.tar.gz.

File metadata

Download URL: talktollm-0.4.0.tar.gz
Upload date: Jul 10, 2025
Size: 22.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for talktollm-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`675cc8cef869c834657f1bff4829249ddf62403f2ec7a680f122fbfd534e1c99`
MD5	`27146e4ffda5a6f4b9a8f30722163989`
BLAKE2b-256	`3964a8b7e93e60f54f70da9c7d0239708442fa611f7ec0d6f07c9778a2d96caa`

See more details on using hashes here.

File details

Details for the file talktollm-0.4.0-py3-none-any.whl.

File metadata

Download URL: talktollm-0.4.0-py3-none-any.whl
Upload date: Jul 10, 2025
Size: 20.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for talktollm-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f098363314734706e46dbf5734dbbcb89daa54ddf326f183b7ffc9e26c3d75ac`
MD5	`0f31366e0c7cd321e09fe3076c965e02`
BLAKE2b-256	`df583dc5461d628f09d9529666f2db89ee209b40c6d08f48af766ea7eec7c4d6`

See more details on using hashes here.

talktollm 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

talktollm

Features

Core Functionality

Helper Functions

Installation

Usage

Dependencies

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes