Official Python SDK for Olyptik API

These details have not been verified by PyPI

Project links

Project description

Olyptik Python SDK

The Olyptik Python SDK provides a simple and intuitive interface for web crawling and content extraction. It supports both synchronous and asynchronous programming patterns with full type hints.

Installation

Install the SDK using pip:

pip install olyptik

Configuration

First, you'll need to initialize the SDK with your API key - you can get it from the settings page. You can either pass it directly or use environment variables.

from olyptik import Olyptik

# Initialize with API key
client = Olyptik(api_key="your_api_key_here")

Synchronous Usage

Start a crawl

crawl = client.run_crawl({
    "startUrl": "https://example.com",
    "maxResults": 50
})

print(f"Crawl started with ID: {crawl.id}")
print(f"Status: {crawl.status}")

# Start a crawl
crawl = client.run_crawl({
    "startUrl": "https://example.com",
    "maxResults": 50,
    "maxDepth": 2,
    "engineType": "auto",
    "includeLinks": True,
    "timeout": 60,
    "useSitemap": False,
    "useStaticIps": False
})

print(f"Crawl started with ID: {crawl.id}")
print(f"Status: {crawl.status}")

Get crawl results

results = client.get_crawl_results(crawl.id)
for result in results.results:
    print(f"URL: {result.url}")
    print(f"Title: {result.title}")
    print(f"Depth: {result.depthOfUrl}")

Abort a crawl

aborted_crawl = client.abort_crawl(crawl.id)
print(f"Crawl aborted with ID: {aborted_crawl.id}")

Asynchronous Usage

For better performance with I/O operations, use the async client:

Start a crawl

import asyncio
from olyptik import AsyncOlyptik

async def main():
    async with AsyncOlyptik(api_key="your_api_key_here") as client:
        crawl = await client.run_crawl({
            "startUrl": "https://example.com",
            "maxResults": 50
        })

        print(f"Crawl started with ID: {crawl.id}")
        print(f"Status: {crawl.status}")

asyncio.run(main())

import asyncio
from olyptik import AsyncOlyptik

async def main():
    async with AsyncOlyptik(api_key="your_api_key_here") as client:
        # Start a crawl
        crawl = await client.run_crawl({
            "startUrl": "https://example.com",
            "maxResults": 50,
            "maxDepth": 2,
            "engineType": "auto",
            "includeLinks": True,
            "timeout": 60,
            "useSitemap": False,
            "useStaticIps": False
        })

        print(f"Crawl started with ID: {crawl.id}")
        print(f"Status: {crawl.status}")

asyncio.run(main())

Get crawl results

import asyncio
from olyptik import AsyncOlyptik

async def main():
    async with AsyncOlyptik(api_key="your_api_key_here") as client:
        # First start a crawl
        crawl = await client.run_crawl({
            "startUrl": "https://example.com",
            "maxResults": 50
        })
        
        # Get crawl results
        results = await client.get_crawl_results(crawl.id)
        for result in results.results:
            print(f"URL: {result.url}")
            print(f"Title: {result.title}")
            print(f"Depth: {result.depthOfUrl}")

asyncio.run(main())

Abort a crawl

import asyncio
from olyptik import AsyncOlyptik

async def main():
    async with AsyncOlyptik(api_key="your_api_key_here") as client:
        # First start a crawl
        crawl = await client.run_crawl({
            "startUrl": "https://example.com",
            "maxResults": 50
        })
        
        # Abort the crawl
        aborted_crawl = await client.abort_crawl(crawl.id)
        print(f"Crawl aborted with ID: {aborted_crawl.id}")

asyncio.run(main())

Configuration Options

StartCrawlPayload

The crawl configuration options available:

The run crawl payload:

Property	Type	Required	Default	Description
startUrl	string	✅	-	The URL to start crawling from
maxResults	number	✅	-	Maximum number of results to collect (1-10,000)
maxDepth	number	❌	10	Maximum depth of pages to crawl (1-100)
includeLinks	boolean	❌	true	Whether to include links in the crawl results' markdown
useSitemap	boolean	❌	false	Whether to use sitemap.xml to crawl the website
timeout	number	❌	60	Timeout duration in minutes
engineType	string	❌	"auto"	The engine to use: "auto", "cheerio" (fast, static sites), "playwright" (dynamic sites)
useStaticIps	boolean	❌	false	Whether to use static IPs for the crawl

Engine Types

Choose the appropriate engine for your crawling needs:

from olyptik import EngineType

# Available engine types
EngineType.AUTO        # Automatically choose the best engine
EngineType.PLAYWRIGHT  # Use Playwright for JavaScript-heavy sites
EngineType.CHEERIO     # Use Cheerio for faster, static content crawling

Crawl Status

Monitor your crawl status using the CrawlStatus enum:

from olyptik import CrawlStatus

# Possible status values
CrawlStatus.RUNNING    # Crawl is currently running
CrawlStatus.SUCCEEDED  # Crawl completed successfully
CrawlStatus.FAILED     # Crawl failed due to an error
CrawlStatus.TIMED_OUT  # Crawl exceeded timeout limit
CrawlStatus.ABORTED    # Crawl was manually aborted
CrawlStatus.ERROR      # Crawl encountered an error

Error Handling

The SDK provides comprehensive error handling:

from olyptik import Olyptik, OlyptikError, ApiError

client = Olyptik(api_key="your_api_key_here")

try:
    crawl = client.run_crawl({
        "startUrl": "https://example.com",
        "maxResults": 10
    })
except ApiError as e:
    print(f"API Error: {e.message}")
    print(f"Status Code: {e.status_code}")
except OlyptikError as e:
    print(f"SDK Error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Data Models

CrawlResult

Each crawl result contains:

@dataclass
class CrawlResult:
    crawlId: str          # Unique identifier for the crawl
    brandId: str          # Brand identifier
    url: str              # The crawled URL
    title: str            # Page title
    markdown: str         # Extracted content in markdown format
    depthOfUrl: int       # How deep this URL was in the crawl
    createdAt: str        # When the result was created

Crawl

Crawl metadata includes:

@dataclass
class Crawl:
    id: str                    # Unique crawl identifier
    status: CrawlStatus        # Current status
    startUrls: List[str]       # Starting URLs
    includeLinks: bool         # Whether links are included
    maxDepth: int              # Maximum crawl depth
    maxResults: int            # Maximum number of results
    brandId: str               # Brand identifier
    createdAt: str             # Creation timestamp
    completedAt: Optional[str] # Completion timestamp
    durationInSeconds: int     # Total duration
    numberOfResults: int       # Number of results found
    useSitemap: bool          # Whether sitemap was used
    timeout: int              # Timeout setting

Best Practices

1. Use Async for Better Performance

# ✅ Good: Use async for I/O intensive operations
async with AsyncOlyptik(api_key="your_api_key") as client:
    crawl = await client.run_crawl(payload)
    results = await client.get_crawl_results(crawl.id)

# ❌ Avoid: Blocking operations in async context
client = Olyptik(api_key="your_api_key")  # In async function

4. Choose the Right Engine

# ✅ Good: Choose engine based on site type
# For JavaScript-heavy sites
crawl = client.run_crawl({
    "startUrl": "https://spa-app.com",
    "engineType": EngineType.PLAYWRIGHT
})

# For static content sites
crawl = client.run_crawl({
    "startUrl": "https://blog.example.com", 
    "engineType": EngineType.CHEERIO
})

Troubleshooting

Common Issues

Import Error: Make sure you have installed the package correctly:

pip install --upgrade olyptik

Authentication Error: Verify your API key is correct and has sufficient permissions.

Timeout Issues: Increase the timeout value for large crawls:

crawl = client.run_crawl({
    "startUrl": "https://example.com",
    "timeout": 300  # 5 minutes
})

Rate Limiting: The SDK automatically handles retries, but you can implement additional backoff:

import time
from olyptik import ApiError

try:
    crawl = client.run_crawl(payload)
except ApiError as e:
    if e.status_code == 429:
        time.sleep(60)  # Wait 1 minute
        crawl = client.run_crawl(payload)

Support

📧 Email: support@olyptik.io
📚 API Reference: API Documentation

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.6

Nov 10, 2025

0.1.5

Oct 13, 2025

0.1.4

Sep 29, 2025

0.1.3

Sep 29, 2025

0.1.2

Sep 6, 2025

This version

0.1.1

Aug 19, 2025

0.1.0

Aug 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

olyptik-0.1.1.tar.gz (8.9 kB view details)

Uploaded Aug 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

olyptik-0.1.1-py3-none-any.whl (7.3 kB view details)

Uploaded Aug 19, 2025 Python 3

File details

Details for the file olyptik-0.1.1.tar.gz.

File metadata

Download URL: olyptik-0.1.1.tar.gz
Upload date: Aug 19, 2025
Size: 8.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for olyptik-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`b3b2601f62a2b58d673bd26f7bbea3e98cffee15e94847791f8e2304b085dc90`
MD5	`30fdfc44153b0d206aafacb56c9620d7`
BLAKE2b-256	`a3fe42a16934a98b53619d235c45f6b52ef20472d0a6f2743ea203275e7dcdb5`

See more details on using hashes here.

File details

Details for the file olyptik-0.1.1-py3-none-any.whl.

File metadata

Download URL: olyptik-0.1.1-py3-none-any.whl
Upload date: Aug 19, 2025
Size: 7.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for olyptik-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6cfdaa78cbcde0b9b0119e919e7df7213e0f9c3a0a7bea0c30389fa9cc0c40c8`
MD5	`85c7cfb77a59a0a5c6e78a82b6b32cb2`
BLAKE2b-256	`692404593427e31732abd0403a62b1a4ee6bba11cd90808dcbec62d860e63d79`

See more details on using hashes here.

olyptik 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Olyptik Python SDK

Installation

Configuration

Synchronous Usage

Start a crawl

Get crawl results

Abort a crawl

Asynchronous Usage

Start a crawl

Get crawl results

Abort a crawl

Configuration Options

StartCrawlPayload

Engine Types

Crawl Status

Error Handling

Data Models

CrawlResult

Crawl

Best Practices

1. Use Async for Better Performance

4. Choose the Right Engine

Troubleshooting

Common Issues

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes