Skip to main content

a spider bot (scrawler) by python, using selenium and chrome driver

Project description

spiderbot

a spider bot, published as module spiderbot.

How to use?

Install

pip install spiderbot

install chrome browser and chromedriver, and put chromedrive binary into the PATH dir.

Config

Init the config_private.py, using config_private_sample.py as example, and update the value of XPATHS and DB_NAME arguments.

or,

When Generating an instance of SpiderBot, pass the value of xpaths and db_name arguments.

Run it

Init the database by pass the init=True to Generate an instance of SpiderBot. If successed, spiderbot.db was created.

from spiderbot import SpiderBot

bot = SpiderBot(skip_driver=True, init=True)

Then add users to scrawler. You can add users always as needed.

from spiderbot import SpiderBot

urls = ["https://example.com/user_a_homepage", "https://example.com/user_b_homepage"]

bot = SpiderBot()
bot.add_users(working_status=True, *urls)

At last, do the main job:

from spiderbot import SpiderBot

bot = SpiderBot()
bot.get_profiles()
bot.get_new_posturls()
bot.get_history_posturls(1, 9)
bot.get_posts()
bot.quit()

more examples

Code Format

isort .
black .
pylint spiderbot > pylint_spiderbot.log

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spiderbot-0.3.2.tar.gz (21.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spiderbot-0.3.2-py3-none-any.whl (24.1 kB view details)

Uploaded Python 3

File details

Details for the file spiderbot-0.3.2.tar.gz.

File metadata

  • Download URL: spiderbot-0.3.2.tar.gz
  • Upload date:
  • Size: 21.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for spiderbot-0.3.2.tar.gz
Algorithm Hash digest
SHA256 439ff51b3d5dc4e8c8393fe61dbf35f2d7d0fb25d3ed0f5d8a89383485790a8a
MD5 9353c458c6aaff1a61c749c5e6188ae8
BLAKE2b-256 ec2f2fc4fda90bcb09c9cf5126e66d068038fb05298bc74d1d17bfb45e26dd89

See more details on using hashes here.

File details

Details for the file spiderbot-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: spiderbot-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 24.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for spiderbot-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0c9af26177afb26ae7aa0b97232b5b2660ade8c62881622a856edc608dfe304d
MD5 13dd609d4ca6cf8f3e4162c7ee01a326
BLAKE2b-256 5b8a491be4a00d1c1898d2ebac6d661557d8224892226c8eae3f2f9d601d1b6a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page