Skip to main content

No project description provided

Project description

Dendrite

Dendrite is an SDK that makes it easy to interact with and scrape websites using prompts.

# Extracting data from a page is as simple as doing page.scrape(prompt)
startup_urls = await page.scrape("Extract a list of valid urls for each listed startup's YC page")

Join our discord to give feedback and report bugs! Invite link

Installation:

pip install dendrite-python-sdk && playwright install

Quick start:

Here is a very simple example of how to use Dendrite.

Install dendrite and asyncio like so: pip install asyncio dendrite-python-sdk && playwright install and create a file called main.py. Use the code below and run it with python main.py from the terminal.

main.py

from dendrite_python_sdk import DendriteBrowser
import asyncio

async def main():
    # Launch browser locally
    dendrite_browser = DendriteBrowser(openai_api_key=...) # Use your OpenAI key here

    # Go to google.com
    page = await dendrite_browser.goto("https://google.com")

    # Get the reject cookies button
    close_cookies = await page.get_interactable_element("reject cookies popup button")

    # Click the button, returns IncorrectOutcomeException if the popup wasn't closed
    await close_cookies.click(expected_outcome="That the cookies popup closed.")

    # Get the search bar
    search_bar = await page.get_interactable_element("Return the search bar")

    # Enter "hello world" into it.
    await search_bar.fill(
        "hello world", expected_outcome="The words 'hello world' to have been entered"
    )

    # Use the Playwright page to press enter.
    await page.get_playwright_page().keyboard.press("Enter")

    await asyncio.sleep(2)
    await dendrite_browser.close()


asyncio.run(main())

More Advanced Example:

With Dendrite we can do more than just finding and using elements.

We can scrape data from pages and ensure that the returned data followed a certain structure by prompting, using a JSON schema or passing in a pydantic model.

Below is an example of how he can, see more examples in dendrite_python_sdk/examples.

import os
import asyncio

from dendrite_python_sdk import DendriteBrowser
from dotenv import load_dotenv, find_dotenv
import pandas as pd

load_dotenv(find_dotenv())


async def main():
    dendrite_browser = DendriteBrowser(
        openai_api_key=os.environ.get("OPENAI_API_KEY", ""),
        dendrite_api_key=os.environ.get("DENDRITE_API_KEY", "") # Generate a dendrite API at https://dendrite.se to get better rate limits and latency.
    )

    page = await dendrite_browser.goto(
        "https://ycombinator.com/companies", scroll_through_entire_page=False
    )

    # Get elements with prompts instead of using brittle selectors
    search_bar = await page.get_interactable_element(
        "The search bar used to find startups"
    )
    await search_bar.fill("AI agent")

    # Scraping data is as easy as writing a good prompt
    await page.scroll_through_entire_page()
    startup_urls = await page.scrape(
        "Extract a list of valid urls for each listed startup's YC page"
    )

    # Let's go to every AI agent startup's page and fetch information about the founders
    startup_info = []
    for url in startup_urls:
        page = await dendrite_browser.goto(url)
        startup_details = await page.scrape(
            "Extract a dict like this: {name: str, slogan: str, founders_social_media_urls: a string where each url is separated by a linebreak}"
        )
        print("startup_details: ", startup_details)
        startup_info.append(startup_details)

    df = pd.DataFrame(startup_info)
    df.to_excel("ai_agent_founders.xlsx", index=False)


asyncio.run(main())

Class Documentation

DendriteBrowser

DendriteBrowser is an abstraction on top of playwright's browser. You can do everything you would normally do with Playwright but with a few new handy AI functions. Access

Methods

__init__(self, openai_api_key: str, id=None, playwright_options: Any = {"headless": False})

Initializes the DendriteBrowser class.

Params:

  • openai_api_key (str): Your OpenAI API Key.
  • id (Optional): The unique ID for the browser session.
  • playwright_options (dict): Options to configure Playwright browser.

get_active_page(self) -> DendritePage

Returns the active page.

goto(self, url: str, scroll_through_entire_page: Optional[bool] = True) -> DendritePage

Navigates to the specified URL.

Params:

  • url (str): URL of the webpage.
  • scroll_through_entire_page (Optional[bool]): Whether to scroll through the entire page.

google_search(self, query: str, filter_results_prompt: Optional[str] = None, load_all_results: Optional[bool] = True) -> List[SearchResult]

Performs a Google search and returns search results.

Params:

  • query (str): The search query.
  • filter_results_prompt (Optional[str]): Prompt to filter results.
  • load_all_results (Optional[bool]): Whether to load all results.

new_page(self) -> DendritePage

Opens a new page in the browser.

close(self)

Closes the browser context.

DendritePage

Abstraction layer on top of Playwright's Page class. Use get_playwright_locator to get the playwright page.

Methods

get_playwright_page(self) -> Page

Returns the Playwright page instance.

scrape(self, prompt: str, expected_return_data: Optional[str] = None, return_data_json_schema: Optional[Any] = None, pydantic_return_model: Optional[Type[BaseModel]] = None) -> Any

Scrapes data from the page based on a prompt.

Params:

  • prompt (str): The prompt to gather data.
  • expected_return_data (Optional[str]): Expected return data.
  • return_data_json_schema (Optional[Any]): JSON schema for return data.
  • pydantic_return_model (Optional[Type[BaseModel]]): Pydantic model for return data.

scroll_through_entire_page(self) -> None

Scrolls through the entire page.

get_interactable_element(self, prompt: str) -> DendriteLocator

Returns an interactable element based on a prompt.

DendriteLocator

Abstraction layer on top of Playwright's Locator class. Use get_playwright_locator to get the playwright locator.

Methods

get_playwright_locator(self) -> Locator

Returns the Playwright locator.

wait_for_page_changes(self, old_url: str, timeout: float = 2)

Waits for changes in the page. Params:

  • old_url (str): The URL before the interaction.
  • timeout (float): The time to wait for changes.

click(self, expected_outcome="", *args, **kwargs) -> InteractionResponse

Clicks the element and handles the interaction.

Params:

  • expected_outcome (str): Expected outcome after clicking.

fill(self, value: str, expected_outcome="", *args, **kwargs)

Fills the element with the provided value.

Params:

  • value (str): The value to fill.
  • expected_outcome (str): Expected outcome after filling.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dendrite_python_sdk-1.0.1.tar.gz (47.1 kB view details)

Uploaded Source

Built Distribution

dendrite_python_sdk-1.0.1-py3-none-any.whl (25.7 kB view details)

Uploaded Python 3

File details

Details for the file dendrite_python_sdk-1.0.1.tar.gz.

File metadata

  • Download URL: dendrite_python_sdk-1.0.1.tar.gz
  • Upload date:
  • Size: 47.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.2 Darwin/23.5.0

File hashes

Hashes for dendrite_python_sdk-1.0.1.tar.gz
Algorithm Hash digest
SHA256 ad2aa3844b9032906450346c3f68ed301bdf902c19462be708225627bf6ae5b9
MD5 e08edec2b681097ac74ebed79dda5fa9
BLAKE2b-256 c4c9231f69d7c1094aac39cbd6bde8f0e7c7b876270b6e5e55d288b11bfe9335

See more details on using hashes here.

File details

Details for the file dendrite_python_sdk-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for dendrite_python_sdk-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 55627294f0ab75941b048aaa8c10f1a9b23074d216703192f74e49be4b92be19
MD5 685a544601bb3eae7ac9a8f60526c58b
BLAKE2b-256 6e24336b0292990b6af1b98c0dccd9d68249fa6bd4d46e18d707415c1b3cb874

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page