SDK for Crawlab AI
Project description
Crawlab AI SDK
This is the Python SDK for Crawlab AI, an AI-powered web scraping platform maintained by Crawlab.
Installation
pip install crawlab-ai
Pre-requisites
An API token is required to use this SDK. You can get the API token from the Crawlab official website.
Usage
Get data from a list page
from crawlab_ai import read_list
# Define the URL and fields
url = "https://example.com"
# Get the data without specifying fields
df = read_list(url=url)
print(df)
# You can also specify fields
fields = ["title", "content"]
df = read_list(url=url, fields=fields)
# You can also return a list of dictionaries instead of a DataFrame
data = read_list(url=url, as_dataframe=False)
print(data)
Usage with Scrapy
Create a Scrapy spider by extending ScrapyListSpider
:
from crawlab_ai import ScrapyListSpider
class MySpider(ScrapyListSpider):
name = "my_spider"
start_urls = ["https://example.com"]
fields = ["title", "content"]
Then run the spider:
scrapy crawl my_spider
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
crawlab-ai-0.0.9.tar.gz
(9.3 kB
view hashes)
Built Distribution
crawlab_ai-0.0.9-py3-none-any.whl
(14.2 kB
view hashes)
Close
Hashes for crawlab_ai-0.0.9-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d61d176387b392e92842ac0432142402e2a3ff9e3540bb689487cfb30cd827af |
|
MD5 | 817b3e45765e2bf5d542389173bc5e01 |
|
BLAKE2b-256 | 1b3a556b8066a8e3603632d0a39779a0b14aa3ba408a84e2c685332ac898d9de |