Official Python SDK for Olyptik API

These details have not been verified by PyPI

Project links

Homepage

Project description

Olyptik Python SDK

The Olyptik Python SDK provides a simple and intuitive interface for web crawling and content extraction. It supports both synchronous and asynchronous programming patterns with full type hints.

Installation

Install the SDK using pip:

pip install olyptik

Configuration

First, you'll need to initialize the SDK with your API key - you can get it from the settings page. You can either pass it directly or use environment variables.

from olyptik import Olyptik

# Initialize with API key
client = Olyptik(api_key="your_api_key_here")

Synchronous Usage

Start a crawl

Minimal settings crawl:

crawl = client.run_crawl({
    "startUrl": "https://example.com",
    "maxResults": 50
})

print(f"Crawl started with ID: {crawl.id}")
print(f"Status: {crawl.status}")

Full example:

# Start a crawl
crawl = client.run_crawl({
    "startUrl": "https://example.com",
    "maxResults": 50,
    "maxDepth": 2,
    "engineType": "auto",
    "includeLinks": True,
    "timeout": 60,
    "useSitemap": False,
    "entireWebsite": False,
    "excludeNonMainTags": True,
    "useStaticIps": False
})

print(f"Crawl started with ID: {crawl.id}")
print(f"Status: {crawl.status}")

Query crawls

from olyptik import CrawlStatus

result = client.query_crawls({
    "startUrls": ["https://example.com"],
    "status": [CrawlStatus.SUCCEEDED],
    "page": 0,
})

print("Crawls: ", result.results)
print("Page: ", result.page)
print("Total pages: ", result.totalPages)
print("Count of items per page: ", result.limit)
print("Total matched crawls: ", result.totalResults)

Getting Crawl Results

Retrieve the results of your crawl using the crawl ID. The results are paginated, and you can specify the page number and limit per page.

limit = 50
page = 0
results = client.get_crawl_results(crawl.id, page, limit)
for result in results.results:
    print(f"URL: {result.url}")
    print(f"Title: {result.title}")
    print(f"Depth: {result.depthOfUrl}")

Abort a crawl

aborted_crawl = client.abort_crawl(crawl.id)
print(f"Crawl aborted with ID: {aborted_crawl.id}")

Asynchronous Usage

For better performance with I/O operations, use the async client:

Start a crawl

Minimal settings crawl:

import asyncio
from olyptik import AsyncOlyptik

async def main():
    async with AsyncOlyptik(api_key="your_api_key_here") as client:
        crawl = await client.run_crawl({
            "startUrl": "https://example.com",
            "maxResults": 50
        })

        print(f"Crawl started with ID: {crawl.id}")
        print(f"Status: {crawl.status}")

asyncio.run(main())

Full example:

import asyncio
from olyptik import AsyncOlyptik

async def main():
    async with AsyncOlyptik(api_key="your_api_key_here") as client:
        # Start a crawl
        crawl = await client.run_crawl({
            "startUrl": "https://example.com",
            "maxResults": 50,
            "maxDepth": 2,
            "engineType": "auto",
            "includeLinks": True,
            "timeout": 60,
            "useSitemap": False,
            "entireWebsite": False,
            "excludeNonMainTags": True,
            "useStaticIps": False
        })

        print(f"Crawl started with ID: {crawl.id}")
        print(f"Status: {crawl.status}")

asyncio.run(main())

Query crawls

import asyncio
from olyptik import AsyncOlyptik, CrawlStatus

async def main():
    async with AsyncOlyptik(api_key="your_api_key_here") as client:
        result = await client.query_crawls({
            "startUrls": ["https://example.com"],
            "status": [CrawlStatus.SUCCEEDED],
            "page": 0,
        })
        
        print("Crawls: ", result.results)
        print("Page: ", result.page)
        print("Total pages: ", result.totalPages)
        print("Count of items per page: ", result.limit)
        print("Total matched crawls: ", result.totalResults)

asyncio.run(main())

Get crawl results

import asyncio
from olyptik import AsyncOlyptik

async def main():
    async with AsyncOlyptik(api_key="your_api_key_here") as client:
        # First start a crawl
        crawl = await client.run_crawl({
            "startUrl": "https://example.com",
            "maxResults": 50
        })
        
        # Get crawl results
        limit = 50
        page = 0
        results = await client.get_crawl_results(crawl.id, page, limit)
        for result in results.results:
            print(f"URL: {result.url}")
            print(f"Title: {result.title}")
            print(f"Depth: {result.depthOfUrl}")

asyncio.run(main())

Abort a crawl

import asyncio
from olyptik import AsyncOlyptik

async def main():
    async with AsyncOlyptik(api_key="your_api_key_here") as client:
        # First start a crawl
        crawl = await client.run_crawl({
            "startUrl": "https://example.com",
            "maxResults": 50
        })
        
        # Abort the crawl
        aborted_crawl = await client.abort_crawl(crawl.id)
        print(f"Crawl aborted with ID: {aborted_crawl.id}")

asyncio.run(main())

Configuration Options

StartCrawlPayload

The crawl configuration options available:

You must provide at least one of the following: maxResults, useSitemap, or entireWebsite.

Property	Type	Required	Default	Description
startUrl	string	✅	-	The URL to start crawling from
maxResults	number	❌	-	Maximum number of results to collect (1-5,000)
useSitemap	boolean	❌	false	Whether to use sitemap.xml to crawl the website
entireWebsite	boolean	❌	false	Whether to use sitemap.xml and all found links to crawl the website
maxDepth	number	❌	10	Maximum depth of pages to crawl (1-100)
includeLinks	boolean	❌	true	Whether to include links in the crawl results' markdown
excludeNonMainTags	boolean	❌	true	Whether to exclude non-main HTML tags (header, footer, aside, etc.) from the crawl results
timeout	number	❌	60	Timeout duration in minutes
engineType	string	❌	"auto"	The engine to use: "auto", "cheerio" (fast, static sites), "playwright" (dynamic sites)
useStaticIps	boolean	❌	false	Whether to use static IPs for the crawl

Engine Types

Choose the appropriate engine for your crawling needs:

from olyptik import EngineType

# Available engine types
EngineType.AUTO        # Automatically choose the best engine
EngineType.PLAYWRIGHT  # Use Playwright for JavaScript-heavy sites
EngineType.CHEERIO     # Use Cheerio for faster, static content crawling

Crawl Status

Monitor your crawl status using the CrawlStatus enum:

from olyptik import CrawlStatus

# Possible status values
CrawlStatus.RUNNING    # Crawl is currently running
CrawlStatus.SUCCEEDED  # Crawl completed successfully
CrawlStatus.FAILED     # Crawl failed due to an error
CrawlStatus.TIMED_OUT  # Crawl exceeded timeout limit
CrawlStatus.ABORTED    # Crawl was manually aborted
CrawlStatus.ERROR      # Crawl encountered an error

Error Handling

The SDK throws errors for various scenarios. Always wrap your calls in try-catch blocks:

from olyptik import Olyptik, ApiError

client = Olyptik(api_key="your_api_key_here")

try:
    crawl = client.run_crawl({
        "startUrl": "https://example.com",
        "maxResults": 10
    })
except ApiError as e:
    # API returned an error response
    print(f"API Error: {e.message}")
    print(f"Status Code: {e.status_code}")

Data Models

CrawlResult

Each crawl result contains:

@dataclass
class CrawlResult:
    crawlId: str          # Unique identifier for the crawl
    teamId: str          # Team identifier
    url: str              # The crawled URL
    title: str            # Page title
    markdown: str         # Extracted content in markdown format
    depthOfUrl: int       # How deep this URL was in the crawl
    createdAt: str        # When the result was created

Crawl

Crawl metadata includes:

@dataclass
class Crawl:
    id: str                    # Unique crawl identifier
    status: CrawlStatus        # Current status
    startUrls: List[str]       # Starting URLs
    includeLinks: bool         # Whether links are included
    maxDepth: int              # Maximum crawl depth
    maxResults: int            # Maximum number of results
    teamId: str               # Team identifier
    createdAt: str             # Creation timestamp
    completedAt: Optional[str] # Completion timestamp
    durationInSeconds: int     # Total duration
    numberOfResults: int       # Number of results found
    useSitemap: bool          # Whether sitemap was used
    entireWebsite: bool       # Whether to use both sitemap and all found links
    excludeNonMainTags: bool  # Whether non-main HTML tags were excluded
    timeout: int              # Timeout setting
    useStaticIps: bool        # Whether static IPs were used
    engineType: EngineType    # Engine type used

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.6

Nov 10, 2025

0.1.5

Oct 13, 2025

0.1.4

Sep 29, 2025

This version

0.1.3

Sep 29, 2025

0.1.2

Sep 6, 2025

0.1.1

Aug 19, 2025

0.1.0

Aug 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

olyptik-0.1.3.tar.gz (8.5 kB view details)

Uploaded Sep 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

olyptik-0.1.3-py3-none-any.whl (7.1 kB view details)

Uploaded Sep 29, 2025 Python 3

File details

Details for the file olyptik-0.1.3.tar.gz.

File metadata

Download URL: olyptik-0.1.3.tar.gz
Upload date: Sep 29, 2025
Size: 8.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for olyptik-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`ee3e72084ecb34a0de81a4bbc06d871513163ee6e313d24b2b08e94cbd708558`
MD5	`df6065cdecdcc2748c51b80089c0e5dd`
BLAKE2b-256	`e9bcb36de1a5c4d1114e2336cb49585cb702356207836e2655d36afd52b10e5f`

See more details on using hashes here.

File details

Details for the file olyptik-0.1.3-py3-none-any.whl.

File metadata

Download URL: olyptik-0.1.3-py3-none-any.whl
Upload date: Sep 29, 2025
Size: 7.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for olyptik-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`92e76576813a66d33766044ed3e9ba3ae2912115edfed30d9e81b4563c22e558`
MD5	`ddc03c5e1bc5119e28882d95f625999d`
BLAKE2b-256	`8cd32808fb67be476d836a5d531d103485eac01ef2304406d21cf180daf55622`

See more details on using hashes here.

olyptik 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Olyptik Python SDK

Installation

Configuration

Synchronous Usage

Start a crawl

Query crawls

Getting Crawl Results

Abort a crawl

Asynchronous Usage

Start a crawl

Query crawls

Get crawl results

Abort a crawl

Configuration Options

StartCrawlPayload

Engine Types

Crawl Status

Error Handling

Data Models

CrawlResult

Crawl

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes