Skip to main content

Python SDK for SpiderWebAI API

Project description

SpiderWebAI Python SDK

The SpiderWebAI Python SDK offers a toolkit for straightforward website scraping, crawling at scale, and other utilities like extracting links and taking screenshots, enabling you to collect data formatted for compatibility with language models (LLMs). It features a user-friendly interface for seamless integration with the SpiderWebAI API.

Installation

To install the SpiderWebAI Python SDK, you can use pip:

pip install spiderwebai-py

Usage

  1. Get an API key from spiderwebai.xyz
  2. Set the API key as an environment variable named SPIDER_API_KEY or pass it as a parameter to the SpiderWebAIApp class.

Here's an example of how to use the SDK:

from spiderwebai import SpiderWebAIApp

# Initialize the SpiderWebAIApp with your API key
app = SpiderWebAIApp(api_key='your_api_key')

# Scrape a single URL
url = 'https://spiderwebai.xyz'
scraped_data = app.scrape_url(url)

# Crawl a website
crawler_params = {
    'limit': 1,
    'proxy_enabled': True,
    'store_data': False,
    'metadata': False,
    'request': 'http'
}
crawl_result = app.crawl_url(url, params=crawler_params)

Scraping a URL

To scrape data from a single URL:

url = 'https://example.com'
scraped_data = app.scrape_url(url)

Crawling a Website

To automate crawling a website:

url = 'https://example.com'
crawl_params = {
    'limit': 200,
    'request': 'smart_mode'
}
crawl_result = app.crawl_url(url, params=crawl_params)

Retrieving Links from a URL(s)

Extract all links from a specified URL:

url = 'https://example.com'
links = app.links(url)

Taking Screenshots of a URL(s)

Capture a screenshot of a given URL:

url = 'https://example.com'
screenshot = app.screenshot(url)

Extracting Contact Information

Extract contact details from a specified URL:

url = 'https://example.com'
contacts = app.extract_contacts(url)

Labeling Data from a URL(s)

Label the data extracted from a particular URL:

url = 'https://example.com'
labeled_data = app.label(url)

Checking Available Credits

You can check the remaining credits on your account:

credits = app.get_credits()

Streaming

If you need to stream the request use the third param:

url = 'https://example.com'

crawler_params = {
    'limit': 1,
    'proxy_enabled': True,
    'store_data': False,
    'metadata': False,
    'request': 'http'
}

links = app.links(url, crawler_params, True)

Content-Type

The following Content-type headers are supported using the fourth param:

  1. application/json
  2. text/csv
  3. application/xml
  4. application/jsonl
url = 'https://example.com'

crawler_params = {
    'limit': 1,
    'proxy_enabled': True,
    'store_data': False,
    'metadata': False,
    'request': 'http'
}

# stream json lines back to the client
links = app.crawl(url, crawler_params, True, "application/jsonl")

Error Handling

The SDK handles errors returned by the SpiderWebAI API and raises appropriate exceptions. If an error occurs during a request, an exception will be raised with a descriptive error message.

Contributing

Contributions to the SpiderWebAI Python SDK are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository.

License

The SpiderWebAI Python SDK is open-source and released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spiderwebai-py-0.1.4.tar.gz (4.2 kB view details)

Uploaded Source

File details

Details for the file spiderwebai-py-0.1.4.tar.gz.

File metadata

  • Download URL: spiderwebai-py-0.1.4.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for spiderwebai-py-0.1.4.tar.gz
Algorithm Hash digest
SHA256 340fdc88888d64590ea68e39f350b323ec26dd76fcb1c902c2dcf6767b13394c
MD5 2fc6bda3f8e8e4ecb86d5a9f42ee98c3
BLAKE2b-256 cc8feb18850beda1d9a2f6a02b029da3d4772baed961673eddd4f97e44ce6747

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page