Skip to main content

A Python utility for interacting with large language models (LLMs) through a command-line interface

Project description

talktollm

A Python utility for interacting with large language models (LLMs) through a command-line interface. It leverages image recognition to automate interactions with LLM web interfaces, enabling seamless conversations and task execution.

Features

  • Command-Line Interaction: Provides a simple and intuitive command-line interface for interacting with LLMs.
  • Automated Image Recognition: Employs image recognition techniques to identify and interact with elements on the LLM interface, such as input fields and submit buttons.
  • Multi-LLM Support: Currently supports DeepSeek and Gemini, with the potential for expansion to other LLMs.
  • Automated Conversations: Facilitates automated conversations and task execution by simulating user interactions with the LLM interface.
  • Image Support: Allows sending images to the LLM, handling the image processing and clipboard operations.
  • Easy to use: The package is very easy to set up and use.

Core Functionality

The core function of talktollm is the talkto(llm, prompt, imagedata=None, debug=False) function. This function takes the following arguments:

  • llm: The name of the LLM to interact with (e.g., 'deepseek' or 'gemini').
  • prompt: The text prompt to send to the LLM.
  • imagedata: Optional image data to send to the LLM. This should be a list of base64 encoded strings representing the images.
  • debug: A boolean flag to enable debugging output.

The talkto function performs the following steps:

  1. Opens the LLM's website in a new browser tab.
  2. Finds the message input box using image recognition (optimisewait).
  3. If image data is provided, it iterates through the images, converts them to the correct format, and pastes them into the LLM input.
  4. Pastes the provided text prompt into the LLM input.
  5. Finds and clicks the 'run' button using image recognition.
  6. Waits for the LLM to finish processing (for Gemini, it waits for a 'done' indicator).
  7. Finds and clicks the 'copy' button using image recognition.
  8. Closes the browser tab.
  9. Retrieves the LLM's response from the clipboard.

Image Handling

talktollm includes functionality for handling images to be sent to the LLMs.

  • set_clipboard_image(image_data, retries=3, delay=0.2) is used to set the image to the clipboard. This function takes base64 encoded image data, decodes it, converts it to a bitmap, and places it on the clipboard.
  • set_image_path(llm, debug=False) is used to determine where the image is on the users computer by using the copy_images_to_temp function.
  • copy_images_to_temp(llm, debug=False): Copies images used for image recognition to the temporary directory.

Installation

This section provides instructions on how to install the talktollm package.

pip install talktollm

Usage

This will start by trying to find the LLM in the top left of the primary monitor. Basic usage instructions and examples are presented here.

python
import talktollm
response = talktollm.talkto('gemini', 'Write a poem about cats.')
print(response)

# Example with image
# response = talktollm.talkto('gemini', 'Describe this image', imagedata=['data:image/png;base64,iVBOR...'])

Dependencies

Lists the external libraries that talktollm depends on.

  • pywin32
  • pyautogui
  • pillow
  • optimisewait

Contributing

Describes how others can contribute to the development of talktollm.

Pull requests are welcome! For significant changes, it's recommended to open an issue first to discuss the proposed modifications. This helps ensure that contributions align with the project's goals and maintain overall consistency.

License

Specifies the license under which talktollm is distributed.

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

talktollm-0.2.6.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

talktollm-0.2.6-py3-none-any.whl (18.2 kB view details)

Uploaded Python 3

File details

Details for the file talktollm-0.2.6.tar.gz.

File metadata

  • Download URL: talktollm-0.2.6.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for talktollm-0.2.6.tar.gz
Algorithm Hash digest
SHA256 bb9d4431437697b258728c777b06a59c3ce9694f39cfb2e749ccc4624c48fe4d
MD5 07d89e504a1d3d07cd74bd2d25ae7906
BLAKE2b-256 a9bdf75e9e47b848f700dd3a5440c2c735b44d2448434bcc335d3b70f94a2c93

See more details on using hashes here.

File details

Details for the file talktollm-0.2.6-py3-none-any.whl.

File metadata

  • Download URL: talktollm-0.2.6-py3-none-any.whl
  • Upload date:
  • Size: 18.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for talktollm-0.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 ec34d3ec7da4967760889d0f0bf099709e9c74432d3e5bb18fdcb7e1ff5e5b02
MD5 905a6cd70054b3d610952ad5645b1664
BLAKE2b-256 d8b4b6707ee7be2651518818e14b92a7a76ad0ba5b747406dafbfe233c14f35b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page