An integration package connecting Apify and LangChain
Project description
LangChain Apify: A full-stack scraping platform built on Apify's infrastructure and LangChain's AI tools. Maintained by Apify.
Build web scraping and automation workflows in Python by connecting Apify Actors with LangChain. This package gives you programmatic access to Apify's infrastructure - run scraping tasks, handle datasets, and use the API directly through LangChain's tools.
Installation
pip install langchain-apify
Prerequisites
You should configure credentials by setting the following environment variables:
APIFY_API_TOKEN
- Apify API token
Register your free Apify account here and learn how to get your API token in the Apify documentation.
Tools
ApifyActorsTool
class provides access to Apify Actors, which are cloud-based web scraping and automation programs that you can run without managing any infrastructure. For more detailed information, see the Apify Actors documentation.
ApifyActorsTool
is useful when you need to run an Apify Actor as a tool in LangChain. You can use the tool to interact with the Actor manually or as part of an agent workflow.
Example usage of ApifyActorsTool
with the RAG Web Browser Actor, which searches for information on the web:
import os
import json
from langchain_apify import ApifyActorsTool
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"
os.environ["APIFY_API_TOKEN"] = "YOUR_APIFY_API_TOKEN"
browser = ApifyActorsTool('apify/rag-web-browser')
search_results = browser.invoke(input={
"run_input": {"query": "what is Apify Actor?", "maxResults": 3}
})
# use the tool with an agent
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent
model = ChatOpenAI(model="gpt-4o-mini")
tools = [browser]
agent = create_react_agent(model, tools)
for chunk in agent.stream(
{"messages": [("human", "search for what is Apify?")]},
stream_mode="values"
):
chunk["messages"][-1].pretty_print()
Document loaders
ApifyDatasetLoader
class provides access to Apify datasets as document loaders. Datasets are storage solutions that store results from web scraping, crawling, or data processing.
ApifyDatasetLoader
is useful when you need to process data from an Apify Actor run. If you are extracting webpage content, you would typically use this loader after running an Apify Actor manually from the Apify console, where you can access the results stored in the dataset.
Example usage for ApifyDatasetLoader
with a custom dataset mapping function for loading webpage content and source URLs as a list of Document
objects containing the page content and source URL.
import os
from langchain_apify import ApifyDatasetLoader
os.environ["APIFY_API_TOKEN"] = "YOUR_APIFY_API_TOKEN"
# Example dataset structure
# [
# {
# "text": "Example text from the website.",
# "url": "http://example.com"
# },
# ...
# ]
loader = ApifyDatasetLoader(
dataset_id="your-dataset-id",
dataset_mapping_function=lambda dataset_item: Document(
page_content=dataset_item["text"],
metadata={"source": dataset_item["url"]}
),
)
Wrappers
ApifyWrapper
class wraps the Apify API to easily convert Apify datasets into documents. It is useful when you need to run an Apify Actor programmatically and process the results in LangChain. Available methods include:
- call_actor: Runs an Apify Actor and returns an
ApifyDatasetLoader
for the results. - acall_actor: Asynchronous version of
call_actor
. - call_actor_task: Runs a saved Actor task and returns an
ApifyDatasetLoader
for the results. Actor tasks allow you to create and reuse multiple configurations of a single Actor for different use cases. - acall_actor_task: Asynchronous version of
call_actor_task
.
For more information, see the Apify LangChain integration documentation.
Example usage for call_actor
involves running the Website Content Crawler Actor, which extracts content from webpages. The wrapper then returns the results as a list of Document
objects containing the page content and source URL:
import os
from langchain_apify import ApifyWrapper
from langchain_core.documents import Document
os.environ["APIFY_API_TOKEN"] = "YOUR_APIFY_API_TOKEN"
apify = ApifyWrapper()
loader = apify.call_actor(
actor_id="apify/website-content-crawler",
run_input={
"startUrls": [{"url": "https://python.langchain.com/docs/get_started/introduction"}],
"maxCrawlPages": 10,
"crawlerType": "cheerio"
},
dataset_mapping_function=lambda item: Document(
page_content=item["text"] or "",
metadata={"source": item["url"]}
),
)
documents = loader.load()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file langchain_apify-0.1.2.tar.gz
.
File metadata
- Download URL: langchain_apify-0.1.2.tar.gz
- Upload date:
- Size: 14.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | be78f7dcc67fe2aaa83a3f7732303ee71b8b192b7bc949485f02135a5ff3ead1 |
|
MD5 | 33010543726e4e8c89b8363b42819d42 |
|
BLAKE2b-256 | b74ab08697398365925e8c135c4fa5ddc0c3fe579c75e4abe711c8fbf9ddd708 |
Provenance
The following attestation bundles were made for langchain_apify-0.1.2.tar.gz
:
Publisher:
release.yml
on apify/langchain-apify
-
Statement:
- Statement type:
https://in-toto.io/Statement/v1
- Predicate type:
https://docs.pypi.org/attestations/publish/v1
- Subject name:
langchain_apify-0.1.2.tar.gz
- Subject digest:
be78f7dcc67fe2aaa83a3f7732303ee71b8b192b7bc949485f02135a5ff3ead1
- Sigstore transparency entry: 169784817
- Sigstore integration time:
- Permalink:
apify/langchain-apify@8d153d0730d4bb36cd2a323070ffd36fc08690f3
- Branch / Tag:
refs/heads/main
- Owner: https://github.com/apify
- Access:
public
- Token Issuer:
https://token.actions.githubusercontent.com
- Runner Environment:
github-hosted
- Publication workflow:
release.yml@8d153d0730d4bb36cd2a323070ffd36fc08690f3
- Trigger Event:
workflow_dispatch
- Statement type:
File details
Details for the file langchain_apify-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: langchain_apify-0.1.2-py3-none-any.whl
- Upload date:
- Size: 16.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 316c2b86c0f9e66a436af55ebe9cfe17f71b17fb657cf280e2b718759aa43c93 |
|
MD5 | 763a8dda8acbbefe08b18c9231ac6ee9 |
|
BLAKE2b-256 | d403443499dbc42f6157075f8368402ac90dbbf1fe612795e9de861d388b92e8 |
Provenance
The following attestation bundles were made for langchain_apify-0.1.2-py3-none-any.whl
:
Publisher:
release.yml
on apify/langchain-apify
-
Statement:
- Statement type:
https://in-toto.io/Statement/v1
- Predicate type:
https://docs.pypi.org/attestations/publish/v1
- Subject name:
langchain_apify-0.1.2-py3-none-any.whl
- Subject digest:
316c2b86c0f9e66a436af55ebe9cfe17f71b17fb657cf280e2b718759aa43c93
- Sigstore transparency entry: 169784818
- Sigstore integration time:
- Permalink:
apify/langchain-apify@8d153d0730d4bb36cd2a323070ffd36fc08690f3
- Branch / Tag:
refs/heads/main
- Owner: https://github.com/apify
- Access:
public
- Token Issuer:
https://token.actions.githubusercontent.com
- Runner Environment:
github-hosted
- Publication workflow:
release.yml@8d153d0730d4bb36cd2a323070ffd36fc08690f3
- Trigger Event:
workflow_dispatch
- Statement type: