Skip to main content

An integration package connecting Apify and LangChain

Project description

Apify logo

LangChain Apify: Full-stack web scraping and data extraction platform enhanced with AI capabilities. Maintained by Apify.

Apify | Documentation | LangChain

GitHub Repo stars Tests


This package allows you to use Apify, a platform for web scraping and data extraction, with LangChain. It provides tools to interact with Apify Actors, datasets, and API.

Installation

pip install langchain-apify

Prerequisites

You should configure credentials by setting the following environment variables:

  • APIFY_API_TOKEN - Apify API token

Register your free Apify account here and learn how to get your API token in the Apify documentation.

Tools

ApifyActorsTool class provides access to Apify Actors, which are cloud-based web scraping and automation programs that you can run without managing any infrastructure. For more detailed information, see the Apify Actors documentation.

ApifyActorsTool is useful when you need to run an Apify Actor as a tool in LangChain. You can use the tool to interact with the Actor manually or as part of an agent workflow.

Example usage of ApifyActorsTool with the RAG Web Browser Actor, which searches for information on the web:

import os
import json
from langchain_apify import ApifyActorsTool

os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"
os.environ["APIFY_API_TOKEN"] = "YOUR_APIFY_API_TOKEN"

browser = ApifyActorsTool('apify/rag-web-browser')
search_results = browser.invoke(input={
    "run_input": {"query": "what is Apify Actor?", "maxResults": 3}
})

# use the tool with an agent
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent

model = ChatOpenAI(model="gpt-4o-mini")
tools = [browser]
agent = create_react_agent(model, tools)

for chunk in agent.stream(
    {"messages": [("human", "search for what is Apify?")]},
    stream_mode="values"
):
    chunk["messages"][-1].pretty_print()

Document loaders

ApifyDatasetLoader class provides access to Apify datasets as document loaders. Datasets are storage solutions that store results from web scraping, crawling, or data processing.

ApifyDatasetLoader is useful when you need to process data from an Apify Actor run. If you are extracting webpage content, you would typically use this loader after running an Apify Actor manually from the Apify console, where you can access the results stored in the dataset.

Example usage for ApifyDatasetLoader with a custom dataset mapping function for loading webpage content and source URLs as a list of Document objects containing the page content and source URL.

import os
from langchain_apify import ApifyDatasetLoader

os.environ["APIFY_API_TOKEN"] = "YOUR_APIFY_API_TOKEN"

# Example dataset structure
# [
#     {
#         "text": "Example text from the website.",
#         "url": "http://example.com"
#     },
#     ...
# ]

loader = ApifyDatasetLoader(
    dataset_id="your-dataset-id",
    dataset_mapping_function=lambda dataset_item: Document(
        page_content=dataset_item["text"],
        metadata={"source": dataset_item["url"]}
    ),
)

Wrappers

ApifyWrapper class wraps the Apify API to easily convert Apify datasets into documents. It is useful when you need to run an Apify Actor programmatically and process the results in LangChain. Available methods include:

  • call_actor: Runs an Apify Actor and returns an ApifyDatasetLoader for the results.
  • acall_actor: Asynchronous version of call_actor.
  • call_actor_task: Runs a saved Actor task and returns an ApifyDatasetLoader for the results. Actor tasks allow you to create and reuse multiple configurations of a single Actor for different use cases.
  • acall_actor_task: Asynchronous version of call_actor_task.

For more information, see the Apify LangChain integration documentation.

Example usage for call_actor involves running the Website Content Crawler Actor, which extracts content from webpages. The wrapper then returns the results as a list of Document objects containing the page content and source URL:

import os
from langchain_apify import ApifyWrapper
from langchain_core.documents import Document

os.environ["APIFY_API_TOKEN"] = "YOUR_APIFY_API_TOKEN"

apify = ApifyWrapper()

loader = apify.call_actor(
    actor_id="apify/website-content-crawler",
    run_input={
        "startUrls": [{"url": "https://python.langchain.com/docs/get_started/introduction"}],
        "maxCrawlPages": 10,
        "crawlerType": "cheerio"
    },
    dataset_mapping_function=lambda item: Document(
        page_content=item["text"] or "",
        metadata={"source": item["url"]}
    ),
)
documents = loader.load()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_apify-0.1.1b1.tar.gz (14.5 kB view details)

Uploaded Source

Built Distribution

langchain_apify-0.1.1b1-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file langchain_apify-0.1.1b1.tar.gz.

File metadata

  • Download URL: langchain_apify-0.1.1b1.tar.gz
  • Upload date:
  • Size: 14.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for langchain_apify-0.1.1b1.tar.gz
Algorithm Hash digest
SHA256 bf443db002ed33aa8098a4446a7473ad790f5257134c53a1b5e65d196515a79b
MD5 bbe7d7d9d9eb8f35bd16719aec4289c8
BLAKE2b-256 cc6ab3d8796c4e2514b8fab7921340b10bcb734e45194809b8f9a8f752355c13

See more details on using hashes here.

Provenance

The following attestation bundles were made for langchain_apify-0.1.1b1.tar.gz:

Publisher: pre_release.yml on apify/langchain-apify

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file langchain_apify-0.1.1b1-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_apify-0.1.1b1-py3-none-any.whl
Algorithm Hash digest
SHA256 0de99a88c5170516b615563808ce988a1debd9f3de649a843ded6bccf85ebc5e
MD5 21f35da1c66d49f380dd7a0589f77315
BLAKE2b-256 b3b8b95bb2a229b06d4e85711e34efe0d8f91ab1baac06e5bc01925b2ec91e51

See more details on using hashes here.

Provenance

The following attestation bundles were made for langchain_apify-0.1.1b1-py3-none-any.whl:

Publisher: pre_release.yml on apify/langchain-apify

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page