Skip to main content

a spider bot (scrawler) by python, using selenium and chrome driver

Project description

spiderbot

a spider bot, published as module spiderbot.

How to use?

Install

pip install spiderbot

install chrome browser and chromedriver, and put chromedrive binary into the PATH dir.

Config

Init the config_private.py, using config_private_sample.py as example, and update the value of XPATHS and DB_NAME arguments.

or,

When Generating an instance of SpiderBot, pass the value of xpaths and db_name arguments.

Run it

Init the database by pass the init=True to Generate an instance of SpiderBot. If successed, spiderbot.db was created.

from spiderbot import SpiderBot

bot = SpiderBot(skip_driver=True, init=True)

Then add users to scrawler. You can add users always as needed.

from spiderbot import SpiderBot

urls = ["https://example.com/user_a_homepage", "https://example.com/user_b_homepage"]

bot = SpiderBot()
bot.add_users(working_status=True, *urls)

At last, do the main job:

from spiderbot import SpiderBot

bot = SpiderBot()
bot.get_profiles()
bot.get_new_posturls()
bot.get_history_posturls(1, 9)
bot.get_posts()
bot.quit()

more examples

Code Format

isort .
black .
pylint spiderbot > pylint_spiderbot.log

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spiderbot-0.3.3.tar.gz (21.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spiderbot-0.3.3-py3-none-any.whl (24.1 kB view details)

Uploaded Python 3

File details

Details for the file spiderbot-0.3.3.tar.gz.

File metadata

  • Download URL: spiderbot-0.3.3.tar.gz
  • Upload date:
  • Size: 21.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for spiderbot-0.3.3.tar.gz
Algorithm Hash digest
SHA256 1f51554ee800eacd5758032fe423985623fcb5079349cf2f577ad554fed6865a
MD5 d717b5155265e04ce82fbce19c1a4e6c
BLAKE2b-256 40a6384a982c9d5dd43582c4b30c26ab2ee1c46fea26da8d9daab23b53e361d8

See more details on using hashes here.

File details

Details for the file spiderbot-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: spiderbot-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 24.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for spiderbot-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3d349d55961f037b92417963725f535bc89a609472acc9c4fe1093e230139d81
MD5 ea121b43b0ac7b9f80b8bda71e505251
BLAKE2b-256 32b4bddc175e70321524be2569c9e54146d3ca10b83916fac71158de64fe9992

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page