SDK for Crawlab AI
Project description
Crawlab AI SDK
This is the Python SDK for Crawlab AI, an AI-powered web scraping platform maintained by Crawlab.
Installation
pip install crawlab-ai
Pre-requisites
An API token is required to use this SDK. You can get the API token from the Crawlab official website.
Usage
Get data from a list page
from crawlab_ai import read_list
# Define the URL and fields
url = "https://example.com"
# Get the data without specifying fields
df = read_list(url=url)
print(df)
# You can also specify fields
fields = ["title", "content"]
df = read_list(url=url, fields=fields)
# You can also return a list of dictionaries instead of a DataFrame
data = read_list(url=url, as_dataframe=False)
print(data)
Usage with Scrapy
Create a Scrapy spider by extending ScrapyListSpider
:
from crawlab_ai import ScrapyListSpider
class MySpider(ScrapyListSpider):
name = "my_spider"
start_urls = ["https://example.com"]
fields = ["title", "content"]
Then run the spider:
scrapy crawl my_spider
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
crawlab-ai-0.0.10.tar.gz
(9.4 kB
view hashes)
Built Distribution
Close
Hashes for crawlab_ai-0.0.10-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d155b529dc9fd76d3de79046aab948e907760a602e3472d0fcd69b4f976eda06 |
|
MD5 | 7585a35d870091f0e2cba2d2401bbfa6 |
|
BLAKE2b-256 | 0c8acf2cc283d802ed8a51c36cb40d10204c1afa81e8bb0102ca5810ce81cee8 |