Skip to main content

Official Python SDK for Olostep Web Scraping API

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

Olostep SDK

A lightweight Python SDK for interacting with the Olostep scraping, crawling, and batching API.

🚀 Installation

Install from PyPI:

pip install olostep-sdk

🧰 Features

  • Scrape single URLs with different parsers
  • Batch process multiple items
  • Crawl starting from a URL
  • Retrieve and parse content in multiple formats (JSON, Markdown, etc.)

🔑 Getting Started

First, initialize the SDK with your API token:

from olostep_sdk import OlostepClient
from olostep_sdk.services.scrape import ScrapeService
from olostep_sdk.enums import OlostepParser, Format

client = OlostepClient(api_token="your-api-token")
scraper = ScrapeService(client)

🔍 Scrape a URL

result = scraper.scrape(
    url="https://example.com",
    parser=OlostepParser.GOOGLE_SEARCH
)
print(result)

📦 Start a Batch

from olostep_sdk.services.batch import BatchService

batch = BatchService(client)
batch_id = batch.start_batch([
    {"url": "https://example1.com"},
    {"url": "https://example2.com"}
])
batch.wait_until_complete(batch_id)
items = batch.get_items(batch_id)

🌐 Crawl a Website

from olostep_sdk.services.crawl import CrawlService

crawler = CrawlService(client)
crawl_id = crawler.start_crawl("https://example.com")
crawler.wait_until_complete(crawl_id)
results = crawler.get_items(crawl_id)

📄 Formats and Parsers

from olostep_sdk.enums import Format, OlostepParser

Format.MARKDOWN
Format.JSON

OlostepParser.GOOGLE_SEARCH
OlostepParser.BASIC

🧪 Running Tests

python -m unittest discover -s tests

📬 License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

olostep_sdk-0.1.2.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

olostep_sdk-0.1.2-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file olostep_sdk-0.1.2.tar.gz.

File metadata

  • Download URL: olostep_sdk-0.1.2.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for olostep_sdk-0.1.2.tar.gz
Algorithm Hash digest
SHA256 9007280bced8c792114609f072907f311f38afd4db7912cc049b8e5805472efe
MD5 c294daf70d4f3cb572fd6fb23731d4d7
BLAKE2b-256 87a45d63051848baeb95bd3be0dc990325953da7617a7255c14b7c0d0aab7734

See more details on using hashes here.

File details

Details for the file olostep_sdk-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: olostep_sdk-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 8.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for olostep_sdk-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d8ab6178a9babd071e0321a03492c05de1ca2627f3e0c71b03718ebfdfe7d1c8
MD5 e001f78963d8517843cddddd89b5e793
BLAKE2b-256 7d80e4d1349b7706554e293b107f6db0d1a07434973bdfff26b9dd93e2148724

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page