Official Python SDK for Olyptik API

These details have not been verified by PyPI

Project links

Homepage

Project description

Olyptik Python SDK

The Olyptik Python SDK provides a simple and intuitive interface for web crawling and content extraction. It supports both synchronous and asynchronous programming patterns with full type hints.

Installation

Install the SDK using pip:

pip install olyptik

Configuration

First, you'll need to initialize the SDK with your API key - you can get it from the settings page. You can either pass it directly or use environment variables.

from olyptik import Olyptik

# Initialize with API key
client = Olyptik(api_key="your_api_key_here")

Synchronous Usage

Start a crawl

Minimal settings crawl:

crawl = client.run_crawl({
    "startUrl": "https://example.com",
    "maxResults": 50
})

print(f"Crawl started with ID: {crawl.id}")
print(f"Status: {crawl.status}")

Full example:

# Start a crawl
crawl = client.run_crawl({
    "startUrl": "https://example.com",
    "maxResults": 50,
    "maxDepth": 2,
    "engineType": "auto",
    "includeLinks": True,
    "timeout": 60,
    "useSitemap": False,
    "entireWebsite": False,
    "excludeNonMainTags": True,
    "deduplicateContent": True,
    "extraction": "",
    "useStaticIps": False
})

print(f"Crawl started with ID: {crawl.id}")
print(f"Status: {crawl.status}")

Query crawls

from olyptik import CrawlStatus

result = client.query_crawls({
    "startUrls": ["https://example.com"],
    "status": [CrawlStatus.SUCCEEDED],
    "page": 0,
})

print("Crawls: ", result.results)
print("Page: ", result.page)
print("Total pages: ", result.totalPages)
print("Count of items per page: ", result.limit)
print("Total matched crawls: ", result.totalResults)

Getting Crawl Results

Retrieve the results of your crawl using the crawl ID. The results are paginated, and you can specify the page number and limit per page.

limit = 50
page = 0
results = client.get_crawl_results(crawl.id, page, limit)
for result in results.results:
    print(f"URL: {result.url}")
    print(f"Title: {result.title}")
    print(f"Depth: {result.depthOfUrl}")

Abort a crawl

aborted_crawl = client.abort_crawl(crawl.id)
print(f"Crawl aborted with ID: {aborted_crawl.id}")

Get crawl logs

Retrieve logs for a specific crawl to monitor its progress and debug issues:

page = 1
limit = 1200
logs = client.get_crawl_logs(crawl.id, page, limit)
for log in logs.results:
    print(f"[{log.level}] {log.message}: {log.description}")

Scrape multiple URLs

Scrape up to 30 URLs at once without following links:

scrape_response = client.scrape({
    "urls": ["https://example.com", "https://example.com/about"],
    "includeLinks": True,
    "excludeNonMainTags": True,
    "deduplicateContent": True,
    "extraction": "",
    "timeout": 5,
    "engineType": "auto",
    "useStaticIps": False
})

for result in scrape_response.results:
    if result.isSuccess:
        print(f"URL: {result.url}")
        print(f"Title: {result.title}")
        print(f"Links found: {len(result.links)}")
    else:
        print(f"Failed to scrape {result.url}: {result.errorMessage}")

Asynchronous Usage

For better performance with I/O operations, use the async client:

Start a crawl

Minimal settings crawl:

import asyncio
from olyptik import AsyncOlyptik

async def main():
    async with AsyncOlyptik(api_key="your_api_key_here") as client:
        crawl = await client.run_crawl({
            "startUrl": "https://example.com",
            "maxResults": 50
        })

        print(f"Crawl started with ID: {crawl.id}")
        print(f"Status: {crawl.status}")

asyncio.run(main())

Full example:

import asyncio
from olyptik import AsyncOlyptik

async def main():
    async with AsyncOlyptik(api_key="your_api_key_here") as client:
        # Start a crawl
        crawl = await client.run_crawl({
            "startUrl": "https://example.com",
            "maxResults": 50,
            "maxDepth": 2,
            "engineType": "auto",
            "includeLinks": True,
            "timeout": 60,
            "useSitemap": False,
            "entireWebsite": False,
            "deduplicateContent": True,
            "excludeNonMainTags": True,
            "extraction": "",
            "useStaticIps": False
        })

        print(f"Crawl started with ID: {crawl.id}")
        print(f"Status: {crawl.status}")

asyncio.run(main())

Query crawls

import asyncio
from olyptik import AsyncOlyptik, CrawlStatus

async def main():
    async with AsyncOlyptik(api_key="your_api_key_here") as client:
        result = await client.query_crawls({
            "startUrls": ["https://example.com"],
            "status": [CrawlStatus.SUCCEEDED],
            "page": 0,
        })
        
        print("Crawls: ", result.results)
        print("Page: ", result.page)
        print("Total pages: ", result.totalPages)
        print("Count of items per page: ", result.limit)
        print("Total matched crawls: ", result.totalResults)

asyncio.run(main())

Get crawl results

import asyncio
from olyptik import AsyncOlyptik

async def main():
    async with AsyncOlyptik(api_key="your_api_key_here") as client:
        # First start a crawl
        crawl = await client.run_crawl({
            "startUrl": "https://example.com",
            "maxResults": 50
        })
        
        # Get crawl results
        limit = 50
        page = 0
        results = await client.get_crawl_results(crawl.id, page, limit)
        for result in results.results:
            print(f"URL: {result.url}")
            print(f"Title: {result.title}")
            print(f"Depth: {result.depthOfUrl}")

asyncio.run(main())

Abort a crawl

import asyncio
from olyptik import AsyncOlyptik

async def main():
    async with AsyncOlyptik(api_key="your_api_key_here") as client:
        # First start a crawl
        crawl = await client.run_crawl({
            "startUrl": "https://example.com",
            "maxResults": 50
        })
        
        # Abort the crawl
        aborted_crawl = await client.abort_crawl(crawl.id)
        print(f"Crawl aborted with ID: {aborted_crawl.id}")

asyncio.run(main())

Get crawl logs

import asyncio
from olyptik import AsyncOlyptik

async def main():
    async with AsyncOlyptik(api_key="your_api_key_here") as client:
        # First start a crawl
        crawl = await client.run_crawl({
            "startUrl": "https://example.com",
            "maxResults": 50
        })
        
        # Get crawl logs
        page = 1
        limit = 1200
        logs = await client.get_crawl_logs(crawl.id, page, limit)
        for log in logs.results:
            print(f"[{log.level}] {log.message}: {log.description}")

asyncio.run(main())

Scrape multiple URLs

import asyncio
from olyptik import AsyncOlyptik

async def main():
    async with AsyncOlyptik(api_key="your_api_key_here") as client:
        scrape_response = await client.scrape({
            "urls": ["https://example.com", "https://example.com/about"],
            "includeLinks": True,
            "excludeNonMainTags": True,
            "deduplicateContent": True,
            "extraction": "",
            "timeout": 5,
            "engineType": "auto",
            "useStaticIps": False
        })
        
        for result in scrape_response.results:
            if result.isSuccess:
                print(f"URL: {result.url}")
                print(f"Title: {result.title}")
                print(f"Links found: {len(result.links)}")
            else:
                print(f"Failed to scrape {result.url}: {result.errorMessage}")

asyncio.run(main())

Configuration Options

StartCrawlPayload

The crawl configuration options available:

You must provide at least one of the following: maxResults, useSitemap, or entireWebsite.

Property	Type	Required	Default	Description
startUrl	string	✅	-	The URL to start crawling from
maxResults	number	❌	-	Maximum number of results to collect (1-5,000)
useSitemap	boolean	❌	false	Whether to use sitemap.xml to crawl the website
entireWebsite	boolean	❌	false	Whether to use sitemap.xml and all found links to crawl the website
maxDepth	number	❌	10	Maximum depth of pages to crawl (1-100)
includeLinks	boolean	❌	true	Whether to include links in the crawl results' markdown
excludeNonMainTags	boolean	❌	true	Whether to exclude non-main HTML tags (header, footer, aside, etc.) from the crawl results
deduplicateContent	boolean	❌	true	Remove duplicate content from markdown that appears on multiple pages
extraction	string	❌	""	Instructions defining how the AI should extract specific content from the crawl results
timeout	number	❌	60	Timeout duration in minutes
engineType	string	❌	"auto"	The engine to use: "auto", "cheerio" (fast, static sites), "playwright" (dynamic sites)
useStaticIps	boolean	❌	false	Whether to use static IPs for the crawl

StartScrapePayload

The scrape configuration options available:

Property	Type	Required	Default	Description
urls	string[]	✅	-	Array of URLs to scrape (max 30 URLs)
includeLinks	boolean	❌	true	Whether to include links in the scrape results' markdown
excludeNonMainTags	boolean	❌	true	Whether to exclude non-main HTML tags (header, footer, aside, etc.) from the scrape results
deduplicateContent	boolean	❌	true	Remove duplicate content from markdown that appears in multiple scraped pages
extraction	string	❌	""	Instructions defining how the AI should extract specific content from the scrape results
timeout	number	❌	5	Timeout duration in minutes
engineType	string	❌	"auto"	The engine to use: "auto", "cheerio" (fast, static sites), "playwright" (dynamic sites)
useStaticIps	boolean	❌	false	Whether to use static IPs for the scrape

Engine Types

Choose the appropriate engine for your crawling needs:

from olyptik import EngineType

# Available engine types
EngineType.AUTO        # Automatically choose the best engine
EngineType.PLAYWRIGHT  # Use Playwright for JavaScript-heavy sites
EngineType.CHEERIO     # Use Cheerio for faster, static content crawling

Crawl Status

Monitor your crawl status using the CrawlStatus enum:

from olyptik import CrawlStatus

# Possible status values
CrawlStatus.RUNNING    # Crawl is currently running
CrawlStatus.SUCCEEDED  # Crawl completed successfully
CrawlStatus.FAILED     # Crawl failed due to an error
CrawlStatus.TIMED_OUT  # Crawl exceeded timeout limit
CrawlStatus.ABORTED    # Crawl was manually aborted
CrawlStatus.ERROR      # Crawl encountered an error

Crawl Log Level

Monitor log levels using the CrawlLogLevel enum:

from olyptik import CrawlLogLevel

# Possible log levels
CrawlLogLevel.INFO     # Informational messages
CrawlLogLevel.DEBUG    # Debug messages
CrawlLogLevel.WARN     # Warning messages
CrawlLogLevel.ERROR    # Error messages

Error Handling

The SDK throws errors for various scenarios. Always wrap your calls in try-catch blocks:

from olyptik import Olyptik, ApiError

client = Olyptik(api_key="your_api_key_here")

try:
    crawl = client.run_crawl({
        "startUrl": "https://example.com",
        "maxResults": 10
    })
except ApiError as e:
    # API returned an error response
    print(f"API Error: {e.message}")
    print(f"Status Code: {e.status_code}")

Data Models

CrawlResult

Each crawl result contains:

@dataclass
class CrawlResult:
    crawlId: str          # Unique identifier for the crawl
    teamId: str          # Team identifier
    url: str              # The crawled URL
    title: str            # Page title
    markdown: str         # Extracted content in markdown format
    depthOfUrl: int       # How deep this URL was in the crawl
    createdAt: str        # When the result was created

Crawl

Crawl metadata includes:

@dataclass
class Crawl:
    id: str                    # Unique crawl identifier
    status: CrawlStatus        # Current status
    startUrls: List[str]       # Starting URLs
    includeLinks: bool         # Whether links are included
    maxDepth: int              # Maximum crawl depth
    maxResults: int            # Maximum number of results
    teamId: str                # Team identifier
    createdAt: str             # Creation timestamp
    completedAt: Optional[str] # Completion timestamp
    durationInSeconds: int     # Total duration
    totalPages: int       # Number of results found
    useSitemap: bool           # Whether sitemap was used
    entireWebsite: Optional[bool] # Whether to use both sitemap and all found links
    deduplicateContent: bool   # Remove duplicate content from markdown that appears on multiple pages |

    extraction: Optional[str]
    excludeNonMainTags: bool   # Whether non-main HTML tags were excluded
    timeout: int               # Timeout setting
    useStaticIps: bool         # Whether static IPs were used
    engineType: EngineType     # Engine type used

CrawlLog

Each crawl log entry contains:

@dataclass
class CrawlLog:
    id: str                      # Unique log identifier
    message: str                 # Log message
    level: CrawlLogLevel         # Log level (info, debug, warn, error)
    description: str             # Detailed description
    crawlId: str                 # Crawl identifier
    teamId: Optional[str]        # Team identifier
    data: Optional[Dict[str, Any]] # Additional log data
    createdAt: Optional[str]     # Creation timestamp

ScrapeResponse

The response from a scrape operation:

@dataclass
class ScrapeResponse:
    id: str                    # Unique scrape identifier
    teamId: str                # Team identifier
    projectId: str             # Project identifier
    results: List[UrlResult]   # Array of scrape results
    timeout: int               # Timeout in minutes
    origin: str                # Origin of the scrape ("api" or "web")
    createdAt: str             # Creation timestamp
    updatedAt: str             # Last update timestamp

UrlResult

Each URL scrape result contains:

@dataclass
class UrlResult:
    url: str                            # The URL that was scraped
    isSuccess: bool                     # Whether the scrape was successful
    title: str                          # Page title
    markdown: str                       # Extracted content in markdown format
    links: List[str]                    # Links found on the page
    duplicatesRemovedCount: Optional[int]  # Number of duplicate content blocks removed
    errorCode: Optional[int]            # Error code if the scrape failed
    errorMessage: Optional[str]         # Error message if the scrape failed

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.6

Nov 10, 2025

0.1.5

Oct 13, 2025

0.1.4

Sep 29, 2025

0.1.3

Sep 29, 2025

0.1.2

Sep 6, 2025

0.1.1

Aug 19, 2025

0.1.0

Aug 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

olyptik-0.1.6.tar.gz (12.6 kB view details)

Uploaded Nov 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

olyptik-0.1.6-py3-none-any.whl (9.8 kB view details)

Uploaded Nov 10, 2025 Python 3

File details

Details for the file olyptik-0.1.6.tar.gz.

File metadata

Download URL: olyptik-0.1.6.tar.gz
Upload date: Nov 10, 2025
Size: 12.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for olyptik-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`f8d00182d4d18b4fed24ba3723b76cc82776a0da211c56d3a3a1a2b5bbe1a80b`
MD5	`289021f1a701ef4c7857d113c67930f4`
BLAKE2b-256	`7cc0d5f3a394e000ddd4afa38fcd3f915a1586352b1526d2b05f7abdeda643b9`

See more details on using hashes here.

File details

Details for the file olyptik-0.1.6-py3-none-any.whl.

File metadata

Download URL: olyptik-0.1.6-py3-none-any.whl
Upload date: Nov 10, 2025
Size: 9.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for olyptik-0.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`817f59692aeabd08a200a6908c11a2b1b1b3070bff9bf93a6a41fdec848a9b40`
MD5	`04cc3f3ed24fdf1bb863ee0951705e1f`
BLAKE2b-256	`cf2582b8a4375464d2960f9c6bb27249d518f0fcd36d69a45be6ba73d4dadc90`

See more details on using hashes here.

olyptik 0.1.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Olyptik Python SDK

Installation

Configuration

Synchronous Usage

Start a crawl

Query crawls

Getting Crawl Results

Abort a crawl

Get crawl logs

Scrape multiple URLs

Asynchronous Usage

Start a crawl

Query crawls

Get crawl results

Abort a crawl

Get crawl logs

Scrape multiple URLs

Configuration Options

StartCrawlPayload

StartScrapePayload

Engine Types

Crawl Status

Crawl Log Level

Error Handling

Data Models

CrawlResult

Crawl

CrawlLog

ScrapeResponse

UrlResult

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes