Skip to main content

a spider bot (scrawler) by python, using selenium and chrome driver

Project description

spiderbot

a spider bot, published as module spiderbot.

How to use?

Install

pip install spiderbot

install chrome browser and chromedriver, and put chromedrive binary into the PATH dir.

Config

Init the config_private.py, using config_private_sample.py as example, and update the value of XPATHS and DB_NAME arguments.

or,

When Generating an instance of SpiderBot, pass the value of xpaths and db_name arguments.

Run it

Init the database by pass the init=True to Generate an instance of SpiderBot. If successed, spiderbot.db was created.

from spiderbot import SpiderBot

bot = SpiderBot(skip_driver=True, init=True)

Then add users to scrawler. You can add users always as needed.

from spiderbot import SpiderBot

urls = ["https://example.com/user_a_homepage", "https://example.com/user_b_homepage"]

bot = SpiderBot()
bot.add_users(working_status=True, *urls)

At last, do the main job:

from spiderbot import SpiderBot

bot = SpiderBot()
bot.get_profiles()
bot.get_new_posturls()
bot.get_history_posturls(1, 9)
bot.get_posts()
bot.quit()

more examples

Code Format

isort .
black .
pylint spiderbot > pylint_spiderbot.log

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spiderbot-0.3.3.tar.gz (21.4 kB view hashes)

Uploaded Source

Built Distribution

spiderbot-0.3.3-py3-none-any.whl (24.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page