Skip to main content

An integration package connecting AgentQL and LangChain

Project description

langchain-agentql

AgentQL provides web interaction and structured data extraction from any web page using an AgentQL query or a Natural Language prompt. AgentQL can be used across multiple languages and web pages without breaking over time and change.

Installation

pip install -U langchain-agentql

You also need to configure the AGENTQL_API_KEY environment variable. You can acquire an API key from our Dev Portal.

Document Loader

AgentQLLoader is a document loader that uses AgentQL query to extract structured data from a web page.

from langchain_agentql.document_loaders import AgentQLLoader

loader = AgentQLLoader(
    url="https://www.agentql.com/blog",
    query="""
    {
        posts[] {
            title
            url
            date
            author
        }
    }
    """,
    is_scroll_to_bottom_enabled=True
)
docs = loader.load()

You can learn more about how to use AgentQLLoader in this Jupyter notebook.

Tools/Toolkits

AgentQL provides the following three tools:

  • ExtractWebDataTool: Extracts structured data as JSON from a web page given a URL using either an AgentQL query or a Natural Language description of the data.

  • ExtractWebDataBrowserTool: Extracts structured data as JSON from the active web page in a browser using either an AgentQL query or a Natural Language description. This tool must be used with a Playwright browser.

  • GetWebElementBrowserTool: Finds a web element on the active web page in a browser using a Natural Language description and returns its CSS selector for further interaction. This tool must be used with a Playwright browser.

We also provide an AgentQLBrowserToolkit toolkit with both ExtractWebDataBrowserTool and GetWebElementBrowserTool browser tools bundled.

You can learn more about how to use AgentQL tools in this Jupyter notebook.

Extract data using REST API

from langchain_agentql.tools import ExtractWebDataTool

extract_web_data_tool = ExtractWebDataTool()
extract_web_data_tool.invoke({
    'url': 'https://www.agentql.com/blog', 
    'query': '{ posts[] { title url date author } }', 
})

Work with data and web elements using browser

Setup

In order to use the ExtractWebDataBrowserTool and GetWebElementBrowserTool, you need to have a Playwright browser instance. If you do not have an active instance, you can initiate one using the create_async_playwright_browser or create_sync_playwright_browser methods:

from langchain_agentql.utils import create_async_playwright_browser
async_browser = await create_async_playwright_browser()

You can also use an existing browser instance via Chrome DevTools Protocol (CDP) connection URL:

p = await async_playwright().start()
async_browser = await p.chromium.connect_over_cdp("CDP_CONNECTION_URL")

Extract data from the active browser page

from langchain_agentql.tools import ExtractWebDataBrowserTool

extract_web_data_browser_tool = ExtractWebDataBrowserTool(async_browser=async_browser)
json_data = await extract_web_data_browser_tool.ainvoke({'prompt': 'The blog posts with title, url, date of post and author'})

Find a web element on the active browser page

from langchain_agentql.tools import GetWebElementBrowserTool

get_web_element_browser_tool = GetWebElementBrowserTool(async_browser=async_browser)
selector = await get_web_element_browser_tool.ainvoke({'prompt': 'The next page navigation button'})

Agentic Usage

This tool has a more extensive example for agentic usage documented in this Jupyter notebook

Run Tests

In order to run integration tests, you need to configure LLM credentials by setting the OPENAI_API_KEY environment variables first. Then run the tests with the following command:

make integration_tests

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_agentql-1.0.1.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

langchain_agentql-1.0.1-py3-none-any.whl (17.1 kB view details)

Uploaded Python 3

File details

Details for the file langchain_agentql-1.0.1.tar.gz.

File metadata

  • Download URL: langchain_agentql-1.0.1.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for langchain_agentql-1.0.1.tar.gz
Algorithm Hash digest
SHA256 39377193e88961f9b03aaa2ce90487415fbfe4a981dcb6cf0de7da67875189da
MD5 0f97859cc61a207d5ce1a177f68c2c79
BLAKE2b-256 ea0295900900b876723e9105174000929266c3d92e7971e914cab826281b27e9

See more details on using hashes here.

File details

Details for the file langchain_agentql-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_agentql-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 59baa65d78a4084d54f1edf1e8dec9eea393a20c122722d8242da4a1a2d00fb6
MD5 a209a0971e5ca6b82d603e61dd47ca01
BLAKE2b-256 3a2a68e654e5bfacbdc02bb9b5ee794dbd676243248f9e07fcfc9d409381f246

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page