SDK for interacting with WebsiteCrawler.org

Project description

website_crawler_sdk

A Python SDK for interacting with WebsiteCrawler.org, designed to simplify crawling tasks via API. Submit URLs, monitor crawling status, and retrieve structured data with ease.

To use the API, get your API key from WebsiteCrawler.org

ðŸ”§ Features

Trigger crawl jobs remotely
Monitor crawling status in real time
Access current URLs being crawled
Fetch crawl output as raw JSON
Respect API wait times dynamically

ðŸ“¦ Installation

You can install it locally for development:

pip install website_crawler_sdk

##Demo

import time
from website_crawler_sdk import WebsiteCrawlerConfig, WebsiteCrawlerClient

# Replace with your actual API key, target URL, and limit
YOUR_API_KEY = "YOUR_API_KEY" #Your API key goes here
URL = "URL" #Enter a non redirecting URL/domain with https or http
LIMIT = LIMIT #Change limit 

def main():
    cfg = WebsiteCrawlerConfig(YOUR_API_KEY)
    client = WebsiteCrawlerClient(cfg)

    # Submit URL to WebsiteCrawler.org for crawling
    client.submit_url_to_website_crawler(URL, LIMIT) #Submit the URL and Limit to websitecrawler via API

    while True:
        task_status = client.get_task_status() #Start retrieving data if the task_status is true
        print(f"{task_status} << task status")
        time.sleep(2)  #Wait for 2 seconds

        if task_status:
            status = client.get_crawl_status() #get_crawl_status() method gets the crawl status
            currenturl = client.get_current_url() #get_current_url() method gets the current URL
            data = client.get_crawl_data() # get_crawl_data() method gets the structured data once crawling has completed

            print("Crawl status::")
            if status:
                print(status)

            if status == "Crawling": #Crawling is one of the status
                print(f"Current URL:: {currenturl}")

            if status == "Completed!":  #Completed! (with exclamation) is one of the status
                print("Task has been completed... closing the loop")
                if data:
                    print(f"JSON Data:: {data}")
                    time.sleep(20)  # Give extra time for large JSON response
                    break

    print("Job over")

if __name__ == "__main__":
    main()

Project details

Release history Release notifications | RSS feed

0.1.5

Sep 28, 2025

0.1.4

Sep 28, 2025

0.1.3

Jul 21, 2025

0.1.2

Jul 21, 2025

0.1.1

Jul 10, 2025

This version

0.1.0

Jul 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

website_crawler_sdk-0.1.0.tar.gz (3.6 kB view details)

Uploaded Jul 9, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

website_crawler_sdk-0.1.0-py3-none-any.whl (4.3 kB view details)

Uploaded Jul 9, 2025 Python 3

File details

Details for the file website_crawler_sdk-0.1.0.tar.gz.

File metadata

Download URL: website_crawler_sdk-0.1.0.tar.gz
Upload date: Jul 9, 2025
Size: 3.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for website_crawler_sdk-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a79085bed988b0f692c9dfd3853ab643fcb0472a2604fabb167e6e3193038b0a`
MD5	`2ccd8349b4c38f263969a3d8fe8eecac`
BLAKE2b-256	`34765cbd27e69b76fdeb50ee0bf81ae5d4667db4dcefc1515d18f898cb36dee7`

See more details on using hashes here.

File details

Details for the file website_crawler_sdk-0.1.0-py3-none-any.whl.

File metadata

Download URL: website_crawler_sdk-0.1.0-py3-none-any.whl
Upload date: Jul 9, 2025
Size: 4.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for website_crawler_sdk-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2ecf3f24a14516fc7d62556439a3d4170330eddf7b6b76e8661802cdf417e97f`
MD5	`3c603fa77dec8d20b2a3f02fd2f3ef4c`
BLAKE2b-256	`ace939701ce3dd6092e0c166909a98835b9a4798ec4fa81123a29039d0decf10`

See more details on using hashes here.

website-crawler-sdk 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

website_crawler_sdk

ðŸ”§ Features

ðŸ“¦ Installation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes