Skip to main content

An integration package connecting AgentQL and LangChain

Project description

langchain-agentql

AgentQL provides web interaction and structured data extraction from any web page using an AgentQL query or a Natural Language prompt. AgentQL can be used across multiple languages and web pages without breaking over time and change.

Installation

pip install -U langchain-agentql

You also need to configure the AGENTQL_API_KEY environment variable. You can acquire an API key from our Dev Portal.

Document Loader

AgentQLLoader is a document loader that uses AgentQL query to extract structured data from a web page.

from langchain_agentql.document_loaders import AgentQLLoader

loader = AgentQLLoader(
    url="https://www.agentql.com/blog",
    query="""
    {
        posts[] {
            title
            url
            date
            author
        }
    }
    """,
    is_scroll_to_bottom_enabled=True
)
docs = loader.load()

You can learn more about how to use AgentQLLoader in this Jupyter notebook.

Tools/Toolkits

AgentQL provides the following three tools:

  • ExtractWebDataTool: Extracts structured data as JSON from a web page given a URL using either an AgentQL query or a Natural Language description of the data.

  • ExtractWebDataBrowserTool: Extracts structured data as JSON from the active web page in a browser using either an AgentQL query or a Natural Language description. This tool must be used with a Playwright browser.

  • GetWebElementBrowserTool: Finds a web element on the active web page in a browser using a Natural Language description and returns its CSS selector for further interaction. This tool must be used with a Playwright browser.

We also provide an AgentQLBrowserToolkit toolkit with both ExtractWebDataBrowserTool and GetWebElementBrowserTool browser tools bundled.

You can learn more about how to use AgentQL tools in this Jupyter notebook.

Extract data using REST API

from langchain_agentql.tools import ExtractWebDataTool

extract_web_data_tool = ExtractWebDataTool()
extract_web_data_tool.invoke({
    'url': 'https://www.agentql.com/blog', 
    'query': '{ posts[] { title url date author } }', 
})

Work with data and web elements using browser

Setup

In order to use the ExtractWebDataBrowserTool and GetWebElementBrowserTool, you need to have a Playwright browser instance. If you do not have an active instance, you can initiate one using the create_async_playwright_browser or create_sync_playwright_browser methods:

from langchain_agentql.utils import create_async_playwright_browser
async_browser = await create_async_playwright_browser()

You can also use an existing browser instance via Chrome DevTools Protocol (CDP) connection URL:

p = await async_playwright().start()
async_browser = await p.chromium.connect_over_cdp("CDP_CONNECTION_URL")

Extract data from the active browser page

from langchain_agentql.tools import ExtractWebDataBrowserTool

extract_web_data_browser_tool = ExtractWebDataBrowserTool(async_browser=async_browser)
json_data = await extract_web_data_browser_tool.ainvoke({'prompt': 'The blog posts with title, url, date of post and author'})

Find a web element on the active browser page

from langchain_agentql.tools import GetWebElementBrowserTool

get_web_element_browser_tool = GetWebElementBrowserTool(async_browser=async_browser)
selector = await get_web_element_browser_tool.ainvoke({'prompt': 'The next page navigation button'})

Agentic Usage

This tool has a more extensive example for agentic usage documented in this Jupyter notebook

Run Tests

In order to run integration tests, you need to configure LLM credentials by setting the OPENAI_API_KEY environment variables first. Then run the tests with the following command:

make integration_tests

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_agentql-1.0.0.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

langchain_agentql-1.0.0-py3-none-any.whl (17.1 kB view details)

Uploaded Python 3

File details

Details for the file langchain_agentql-1.0.0.tar.gz.

File metadata

  • Download URL: langchain_agentql-1.0.0.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for langchain_agentql-1.0.0.tar.gz
Algorithm Hash digest
SHA256 12fa67ef96e413da4ebc0b5bf7b2216c00a84dc290825b9edaf43f7d331769cc
MD5 dbee04279637cb1c2c2409d4b3b5aa3a
BLAKE2b-256 f8dd33a033b5cb993c0786cca63747b50cac6e1c205ed2406c9e30eee0405bb9

See more details on using hashes here.

File details

Details for the file langchain_agentql-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_agentql-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b1b9f474725711b36979fbaf86acb94a25250651c8b60fb0e95c7b88cac9c0b4
MD5 96b1f4873c655cec2e52a7f5078229af
BLAKE2b-256 b7780f7bb89e0308ab83b6c771f273be2c13eb446d0c64fc0b65188279e4adc0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page