a spider bot (scrawler) by python, using selenium and chrome driver
Project description
spiderbot
a spider bot, published as module spiderbot.
How to use?
Install
pip install spiderbot
install chrome browser and chromedriver, and put chromedrive binary into the PATH dir.
Config
Init the config_private.py, using config_private_sample.py as example, and update the value of XPATHS and DB_NAME arguments.
or,
When Generating an instance of SpiderBot, pass the value of xpaths and db_name arguments.
Run it
Init the database by pass the init=True to Generate an instance of SpiderBot. If successed, spiderbot.db was created.
from spiderbot import SpiderBot
bot = SpiderBot(skip_driver=True, init=True)
Then add users to scrawler. You can add users always as needed.
from spiderbot import SpiderBot
urls = ["https://example.com/user_a_homepage", "https://example.com/user_b_homepage"]
bot = SpiderBot()
bot.add_users(working_status=True, *urls)
At last, do the main job:
from spiderbot import SpiderBot
bot = SpiderBot()
bot.get_profiles()
bot.get_new_posturls()
bot.get_history_posturls(1, 9)
bot.get_posts()
bot.quit()
Code Format
isort .
black .
pylint spiderbot > pylint_spiderbot.log
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spiderbot-0.3.2.tar.gz.
File metadata
- Download URL: spiderbot-0.3.2.tar.gz
- Upload date:
- Size: 21.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
439ff51b3d5dc4e8c8393fe61dbf35f2d7d0fb25d3ed0f5d8a89383485790a8a
|
|
| MD5 |
9353c458c6aaff1a61c749c5e6188ae8
|
|
| BLAKE2b-256 |
ec2f2fc4fda90bcb09c9cf5126e66d068038fb05298bc74d1d17bfb45e26dd89
|
File details
Details for the file spiderbot-0.3.2-py3-none-any.whl.
File metadata
- Download URL: spiderbot-0.3.2-py3-none-any.whl
- Upload date:
- Size: 24.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c9af26177afb26ae7aa0b97232b5b2660ade8c62881622a856edc608dfe304d
|
|
| MD5 |
13dd609d4ca6cf8f3e4162c7ee01a326
|
|
| BLAKE2b-256 |
5b8a491be4a00d1c1898d2ebac6d661557d8224892226c8eae3f2f9d601d1b6a
|