Skip to main content

Apify SDK for Python

Project description

Apify SDK for Python

The official Python SDK for building Apify Actors.

PyPI version PyPI downloads Python versions Build status Coverage License Chat on Discord

apify is the official SDK for building Apify Actors in Python. It handles the Actor lifecycle, storage access, platform events, Apify Proxy, pay-per-event charging, and more.

If you only need to consume the Apify API from Python (running Actors, reading datasets, managing storages) rather than building Actors, use the Apify API client for Python instead. It comes bundled with this SDK.

Table of contents

Installation

The Apify SDK for Python requires Python 3.11 or higher. It is published on PyPI as the apify package and can be installed with pip:

pip install apify

or with uv:

uv add apify

To use the Scrapy integration, install the scrapy extra:

pip install 'apify[scrapy]'

Quick start

An Actor is a Python program that runs inside the async with Actor: context. The context initializes the Actor when it starts and tears it down when it finishes. Here's a minimal Actor that reads its input and stores a result:

from apify import Actor


async def main() -> None:
    async with Actor:
        actor_input = await Actor.get_input()
        Actor.log.info('Actor input: %s', actor_input)
        await Actor.set_value('OUTPUT', 'Hello, world!')

The quickest way to scaffold a full Actor project, with the .actor configuration, input schema, and Dockerfile already in place, is the Apify CLI:

  1. Install the CLI:

    npm install -g apify-cli
    
  2. Create a new Actor from the Python "getting started" template:

    apify create my-actor --template python-start
    
  3. Run it locally:

    cd my-actor
    apify run
    

To create, run, and deploy your first Actor step by step, see the Quick start guide.

What are Actors?

Actors are serverless programs that can do almost anything. From simple scripts and web scrapers to complex automation workflows, AI agents, or even always-on services that expose HTTP endpoints.

They can run either locally or on the Apify platform, where you can scale their execution, monitor runs, schedule tasks, integrate them with other services, or even publish and monetize them. If you're new to Apify, learn more about the platform in the Apify documentation.

For more context, read the Actor whitepaper.

Features

  • Run the full Actor lifecycle inside async with Actor:, covering init, exit, failures, status messages, and reboots (Actor lifecycle).
  • Read Actor input validated against your input schema with Actor.get_input() (Actor input).
  • Read and write datasets, key-value stores, and request queues, locally or on the platform (Working with storages).
  • React to platform events such as system info, migration, and abort (Actor events).
  • Route requests through Apify Proxy with group selection, country targeting, and rotation (Proxy management).
  • Start, call, abort, and metamorph other Actors and tasks, and attach webhooks to run events (Interacting with other Actors, Webhooks).
  • Monetize your Actor with pay-per-event charging (Pay-per-event).
  • Reach the full Apify API through a preconfigured ApifyClient (Accessing the Apify API).

What you can build

Almost any Python project can become an Actor, including projects for:

  • Web scraping and crawling — The SDK is fully compatible with Crawlee, which makes Apify a natural place to deploy and scale your crawlers (see the Crawlee guide). It also works with other popular scraping libraries, such as Scrapy, Scrapling, or Crawl4AI.
  • Browser automation — Drive a real browser with Playwright or Selenium, or with higher-level tools such as Browser Use.
  • Web servers and APIs — Run a web server inside an Actor to serve HTTP requests, for example to expose your scraper as a live API.
  • AI agents — Host agents built with your framework of choice. Ready-made Actor templates cover PydanticAI, CrewAI, LangGraph, LlamaIndex, and Smolagents.
  • MCP servers — Deploy a Python MCP server as an Actor and make its tools available to any MCP client. See MCP server and MCP proxy templates

Whatever you build, the Apify SDK doesn't lock you into a particular framework. Bring the libraries you already use, and let Apify run your project in the cloud.

Usage examples

The examples below show two common setups, but the same async with Actor: pattern works with any stack. For more, see the guides.

HTTPX with BeautifulSoup

Scrape pages with HTTPX and BeautifulSoup, using the Actor's request queue to track URLs:

from bs4 import BeautifulSoup
from httpx import AsyncClient

from apify import Actor


async def main() -> None:
    async with Actor:
        actor_input = await Actor.get_input() or {}
        start_urls = actor_input.get('start_urls', [{'url': 'https://apify.com'}])

        # Enqueue the start URLs into the default request queue.
        request_queue = await Actor.open_request_queue()
        for start_url in start_urls:
            await request_queue.add_request(start_url['url'])

        # Process the queue until it's empty.
        while request := await request_queue.fetch_next_request():
            Actor.log.info(f'Scraping {request.url} ...')
            async with AsyncClient() as client:
                response = await client.get(request.url)
            soup = BeautifulSoup(response.content, 'html.parser')

            # Push the extracted data to the default dataset.
            await Actor.push_data({
                'url': request.url,
                'title': soup.title.string if soup.title else None,
            })

Crawlee with Playwright

Scrape pages with Crawlee's PlaywrightCrawler, which handles queueing, concurrency, and the browser for you:

from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext

from apify import Actor


async def main() -> None:
    async with Actor:
        actor_input = await Actor.get_input() or {}
        start_urls = [url['url'] for url in actor_input.get('start_urls', [{'url': 'https://apify.com'}])]

        crawler = PlaywrightCrawler(max_requests_per_crawl=50, headless=True)

        @crawler.router.default_handler
        async def handler(context: PlaywrightCrawlingContext) -> None:
            Actor.log.info(f'Scraping {context.request.url} ...')
            await context.push_data({
                'url': context.request.url,
                'title': await context.page.title(),
            })
            # Follow links found on the page.
            await context.enqueue_links()

        await crawler.run(start_urls)

Documentation

The full SDK documentation lives at docs.apify.com/sdk/python. For the Apify platform itself, see the Apify documentation.

Section What you'll find
Overview What the SDK is, what Actors are, and how the pieces fit together.
Quick start Create, run, and deploy your first Python Actor.
Concepts Actor lifecycle, input, storages, events, proxy management, interacting with other Actors, webhooks, accessing the Apify API, logging, configuration, and pay-per-event.
Guides Integrations with BeautifulSoup, Parsel, Playwright, Selenium, Crawlee, Scrapy, Scrapling, Crawl4AI, and Browser Use, plus running a web server and using uv.
Upgrading Migrating between major versions.
API reference Generated reference for every class and method.
Changelog Release history and breaking changes.

Related projects

Support and community

Contributing

Bug reports, fixes, and improvements are welcome! See CONTRIBUTING.md for the development setup, coding standards, testing, and release process. The project uses uv for project management and Poe the Poet as a task runner; the typical loop is:

uv run poe install-dev   # install dev dependencies and git hooks
uv run poe check-code    # lint, type-check, and unit tests

License

Released under the Apache License 2.0.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

apify-3.4.2b24.tar.gz (111.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

apify-3.4.2b24-py3-none-any.whl (117.2 kB view details)

Uploaded Python 3

File details

Details for the file apify-3.4.2b24.tar.gz.

File metadata

  • Download URL: apify-3.4.2b24.tar.gz
  • Upload date:
  • Size: 111.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for apify-3.4.2b24.tar.gz
Algorithm Hash digest
SHA256 2f7ce44dbb286a44d6716a655eb850442f3424c4c51e11660727ff7e067dbde0
MD5 b0b539b000dffb53cb6f37c608a51d8a
BLAKE2b-256 54cfa1c7a55353e8bb598a4784b32e2a166b4e8ee17f548ce09a1b2c67287f70

See more details on using hashes here.

Provenance

The following attestation bundles were made for apify-3.4.2b24.tar.gz:

Publisher: manual_release_beta.yaml on apify/apify-sdk-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file apify-3.4.2b24-py3-none-any.whl.

File metadata

  • Download URL: apify-3.4.2b24-py3-none-any.whl
  • Upload date:
  • Size: 117.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for apify-3.4.2b24-py3-none-any.whl
Algorithm Hash digest
SHA256 07f3bc7b51757bb610d160c14d7b15202b3d7236749b826b8ff38ba333def738
MD5 59c05766c90fd8d69dd3e455a5a8fde3
BLAKE2b-256 b9ba4f55ea0c78550aa1c82ea1ce8bb33e99540e173e4395419ffde4fcf35e06

See more details on using hashes here.

Provenance

The following attestation bundles were made for apify-3.4.2b24-py3-none-any.whl:

Publisher: manual_release_beta.yaml on apify/apify-sdk-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page