A utility to help you locate UI elements using HTML and natural language.

These details have not been verified by PyPI

Project links

Homepage

Project description

talk2dom — Locate Web Elements with One Sentence

PyPI Stars License

talk2dom is a focused utility that solves one of the hardest problems in browser automation and UI testing:

✅ Finding the correct UI element on a page.

🧠 Why `talk2dom`

In most automated testing or LLM-driven web navigation tasks, the real challenge is not how to click or type — it's how to locate the right element.

Think about it:

Clicking a button is easy — if you know its selector.
Typing into a field is trivial — if you've already located the right input.
But finding the correct element among hundreds of <div>, <span>, or deeply nested Shadow DOM trees? That's the hard part.

talk2dom is built to solve exactly that.

🎯 What it does

talk2dom helps you locate elements by:

Extracting clean HTML from Selenium WebDriver or any WebElement
Formatting it for LLM consumption (e.g. GPT-4, Claude, etc.)
Returning minimal, clear selectors (like xpath: ... or css: ...)
Supporting retry logic for unstable DOM conditions
Playing nicely with Shadow DOM traversal (you handle it your way)

🤔 Why Selenium?

While there are many modern tools for controlling browsers (like Playwright or Puppeteer), Selenium remains the most robust and cross-platform solution, especially when dealing with:

✅ Safari (WebKit)
✅ Firefox
✅ Mobile browsers
✅ Cross-browser testing grids

These tools often have limited support for anything beyond Chrome-based browsers. Selenium, by contrast, has battle-tested support across all major platforms and continues to be the industry standard in enterprise and CI/CD environments.

That’s why talk2dom is designed to integrate directly with Selenium — it works where the real-world complexity lives.

📦 Installation

pip install talk2dom

🔍 Usage Example

Basic Usage

By default, talk2dom uses gpt-4o-mini to balance performance and cost. However, during testing, gpt-4o has shown the best performance for this task.

Make sure you have OPENAI_API_KEY

export OPENAI_API_KEY="..."

Sample Code

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

from talk2dom import get_locator

driver = webdriver.Chrome()
driver.get("http://www.python.org")
assert "Python" in driver.title
by, value = get_locator(driver, "Find the Search box")
elem = driver.find_element(by, value)
elem.clear()
elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)
assert "No results found." not in driver.page_source
driver.close()

Free Models

You can also use talk2dom with free models like llama-3.3-70b-versatile from Groq.

Make sure you have a Groq API key

export GROQ_API_KEY="..."

Sample Code with Groq

# Use LLaMA-3 model from Groq (fast and free)
by, value = get_locator(driver, "Find the search box", model="llama-3.3-70b-versatile", model_provider="groq")

Full page vs Scoped element queries

The get_locator() function can be used to query the entire page or a specific element. You can pass either a full Selenium driver or a specific WebElement to scope the locator to part of the page.

Why/When use `WebElement` instead of `driver`?

Reduce Token Size: Passing a small subtree instead of the full page saves tokens, a small subtree instead of the full page saves tokens, reduces latency and cost.
Better Scope Accuracy: Useful when the target element exists in a deeply nested or isolated structure (e.g., modals, side panels, embedded components).

No need to extract HTML manually - talk2dom automatically reads outerHTML from any WebElement you pass in.

sample code

modal = driver.find_element(By.CLASS_NAME, "modal")
by, val = get_locator(modal, "Click the confirm button")
element = modal.find_element(by, val)

✨ Philosophy

Our goal is not to control the browser — you still control your browser. Our goal is to find the right DOM element, so you can tell the browser what to do.

✅ Key Features

📍 Locator-first mindset: focus on where, not how
🔁 Retry wrapper for flaky pages
🧠 Built for LLM-agent workflows
🧩 Shadow DOM friendly (you handle traversal, we return selectors)

📄 License

Apache 2.0

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

💬 Questions or ideas?

We’d love to hear how you're using talk2dom in your AI agents or testing flows.
Feel free to open issues or discussions!

⭐️ If you find this project useful, please consider giving it a star!

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.3.5

Sep 13, 2025

0.3.4

Sep 12, 2025

0.3.3

Sep 12, 2025

0.3.2

Sep 11, 2025

0.3.1

Sep 11, 2025

0.3.0

Sep 3, 2025

0.2.9

Sep 1, 2025

0.2.8

Sep 1, 2025

0.2.7

Aug 26, 2025

0.2.6

Jul 29, 2025

0.2.5

Jul 23, 2025

0.2.3

Jul 22, 2025

0.2.2

Jul 21, 2025

0.2.1

Jul 21, 2025

0.2.0

Jun 13, 2025

0.1.9

Apr 17, 2025

0.1.8

Apr 16, 2025

0.1.7

Apr 16, 2025

0.1.6

Apr 14, 2025

This version

0.1.5

Apr 13, 2025

0.1.4

Apr 13, 2025

0.1.3

Apr 10, 2025

0.1.2

Apr 10, 2025

0.1.1

Apr 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

talk2dom-0.1.5.tar.gz (9.5 kB view details)

Uploaded Apr 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

talk2dom-0.1.5-py3-none-any.whl (9.6 kB view details)

Uploaded Apr 13, 2025 Python 3

File details

Details for the file talk2dom-0.1.5.tar.gz.

File metadata

Download URL: talk2dom-0.1.5.tar.gz
Upload date: Apr 13, 2025
Size: 9.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.8

File hashes

Hashes for talk2dom-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`74deadea26e95fa93c8c4b121bbe7d7286d8fe433fd75d724c00c7b721b69fa1`
MD5	`562d824a1167e77d699fee96cda08b7e`
BLAKE2b-256	`71448be4b940b52250c058432e07948c4d6fd74bc45913350ad3bf914e0e7788`

See more details on using hashes here.

File details

Details for the file talk2dom-0.1.5-py3-none-any.whl.

File metadata

Download URL: talk2dom-0.1.5-py3-none-any.whl
Upload date: Apr 13, 2025
Size: 9.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.8

File hashes

Hashes for talk2dom-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dfb16711bb072a95431cd4e7f5ca4240152281fe0574cc12196b1f1eb198a0bd`
MD5	`ad7b1dc1cbb4d98c49dc90f0074319f4`
BLAKE2b-256	`6b425338f6a5d8d734e6b56ad6d5652cbfa5d7c64ff4ac0424658efa38dc9048`

See more details on using hashes here.

talk2dom 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

talk2dom — Locate Web Elements with One Sentence

🧠 Why talk2dom

🎯 What it does

🤔 Why Selenium?

📦 Installation

🔍 Usage Example

Basic Usage

Make sure you have OPENAI_API_KEY

Sample Code

Free Models

Make sure you have a Groq API key

Sample Code with Groq

Full page vs Scoped element queries

Why/When use WebElement instead of driver?

sample code

✨ Philosophy

✅ Key Features

📄 License

Contributing

💬 Questions or ideas?

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

🧠 Why `talk2dom`

Why/When use `WebElement` instead of `driver`?