An integration package connecting AgentQL and LangChain
Project description
langchain-agentql
AgentQL provides web interaction and structured data extraction from any web page using an AgentQL query or a Natural Language prompt. AgentQL can be used across multiple languages and web pages without breaking over time and change.
Installation
pip install -U langchain-agentql
You also need to configure the AGENTQL_API_KEY
environment variable. You can acquire an API key from our Dev Portal.
Document Loader
AgentQLLoader is a document loader that uses AgentQL query to extract structured data from a web page.
from langchain_agentql.document_loaders import AgentQLLoader
loader = AgentQLLoader(
url="https://www.agentql.com/blog",
query="""
{
posts[] {
title
url
date
author
}
}
""",
is_scroll_to_bottom_enabled=True
)
docs = loader.load()
You can learn more about how to use AgentQLLoader in this Jupyter notebook.
Tools/Toolkits
AgentQL provides the following three tools:
-
ExtractWebDataTool
: Extracts structured data as JSON from a web page given a URL using either an AgentQL query or a Natural Language description of the data. -
ExtractWebDataBrowserTool
: Extracts structured data as JSON from the active web page in a browser using either an AgentQL query or a Natural Language description. This tool must be used with a Playwright browser. -
GetWebElementBrowserTool
: Finds a web element on the active web page in a browser using a Natural Language description and returns its CSS selector for further interaction. This tool must be used with a Playwright browser.
We also provide an AgentQLBrowserToolkit
toolkit with both ExtractWebDataBrowserTool
and GetWebElementBrowserTool
browser tools bundled.
You can learn more about how to use AgentQL tools in this Jupyter notebook.
Extract data using REST API
from langchain_agentql.tools import ExtractWebDataTool
extract_web_data_tool = ExtractWebDataTool()
extract_web_data_tool.invoke({
'url': 'https://www.agentql.com/blog',
'query': '{ posts[] { title url date author } }',
})
Work with data and web elements using browser
Setup
In order to use the ExtractWebDataBrowserTool
and GetWebElementBrowserTool
, you need to have a Playwright browser instance. If you do not have an active instance, you can initiate one using the create_async_playwright_browser
or create_sync_playwright_browser
methods:
from langchain_agentql.utils import create_async_playwright_browser
async_browser = await create_async_playwright_browser()
You can also use an existing browser instance via Chrome DevTools Protocol (CDP) connection URL:
p = await async_playwright().start()
async_browser = await p.chromium.connect_over_cdp("CDP_CONNECTION_URL")
Extract data from the active browser page
from langchain_agentql.tools import ExtractWebDataBrowserTool
extract_web_data_browser_tool = ExtractWebDataBrowserTool(async_browser=async_browser)
json_data = await extract_web_data_browser_tool.ainvoke({'prompt': 'The blog posts with title, url, date of post and author'})
Find a web element on the active browser page
from langchain_agentql.tools import GetWebElementBrowserTool
get_web_element_browser_tool = GetWebElementBrowserTool(async_browser=async_browser)
selector = await get_web_element_browser_tool.ainvoke({'prompt': 'The next page navigation button'})
Agentic Usage
This tool has a more extensive example for agentic usage documented in this Jupyter notebook
Run Tests
In order to run integration tests, you need to configure LLM credentials by setting the OPENAI_API_KEY
environment variables first. Then run the tests with the following command:
make integration_tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file langchain_agentql-1.0.1.tar.gz
.
File metadata
- Download URL: langchain_agentql-1.0.1.tar.gz
- Upload date:
- Size: 10.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 39377193e88961f9b03aaa2ce90487415fbfe4a981dcb6cf0de7da67875189da |
|
MD5 | 0f97859cc61a207d5ce1a177f68c2c79 |
|
BLAKE2b-256 | ea0295900900b876723e9105174000929266c3d92e7971e914cab826281b27e9 |
File details
Details for the file langchain_agentql-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: langchain_agentql-1.0.1-py3-none-any.whl
- Upload date:
- Size: 17.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 59baa65d78a4084d54f1edf1e8dec9eea393a20c122722d8242da4a1a2d00fb6 |
|
MD5 | a209a0971e5ca6b82d603e61dd47ca01 |
|
BLAKE2b-256 | 3a2a68e654e5bfacbdc02bb9b5ee794dbd676243248f9e07fcfc9d409381f246 |