Skip to main content

SDK for Crawlab AI

Project description

Crawlab AI SDK

This is the Python SDK for Crawlab AI, an AI-powered web scraping platform maintained by Crawlab.

Installation

pip install crawlab-ai

Pre-requisites

An API token is required to use this SDK. You can get the API token from the Crawlab official website.

Usage

Get data from a list page

from crawlab_ai import read_list

# Define the URL and fields
url = "https://example.com"

# Get the data without specifying fields
df = read_list(url=url)
print(df)

# You can also specify fields
fields = ["title", "content"]
df = read_list(url=url, fields=fields)

# You can also return a list of dictionaries instead of a DataFrame
data = read_list(url=url, as_dataframe=False)
print(data)

Usage with Scrapy

Create a Scrapy spider by extending ScrapyListSpider:

from crawlab_ai import ScrapyListSpider


class MySpider(ScrapyListSpider):
    name = "my_spider"
    start_urls = ["https://example.com"]
    fields = ["title", "content"]

Then run the spider:

scrapy crawl my_spider

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawlab-ai-0.0.9.tar.gz (9.3 kB view hashes)

Uploaded Source

Built Distribution

crawlab_ai-0.0.9-py3-none-any.whl (14.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page