Web miner built based on selenium but more simple operations
Project description
Introduction
This a project built for the SuperWebMiner, which is also a homework of my class. We can use this basic web miner frame to do some web miner works, such as downloading a large quantity of pictures etc. The goal of this project is to enable everyone to start his/her own super mine engine, and at the same time this project pushes me to comprised AI system closer. It would be great for you to give me suggestions on this project, all of us make it better and stronger!
Copyright
- Author: Airscker
- Last Released Time: 2022-4
- Latest Edition: R22.2.0.0
- Open source project. Copyright (C) Airscker, airscker@gmail.com, Mozilla Public License Version 2.0
Basic steps
Here we give you all the steps and references for build your first engine
Preparations
- For Python Before you import code into our project, you need to download the whole zip file and then unfold it, enter the filefolder 'Codes' and open cmd here, then type in the command below and enter:
pip install -r requirements.txt
- For Browser Now you need to install Chrome browser(this project only support chrome currently). Secondly get your chrome's edtion number in Settings. Then download chrome driver according to your edition number here. Move the webdriver.exe into the Scripts root path of python, such as: C:\Python\Python39\Scripts
Import
wait until all download threads executed,copy the file 'SuperMiner.py' and put it in the root of your project, then open your project, type in:
import SuperMiner as SP
Start your first engine
Here we show the basic steps to download Hello world images
- Initialize your engine
Hello_engine=SP.SuperMiner(url='https://cn.bing.com/images/search?q=Hello+world')
- Start miner engine
Hello_engine=SP.MineEngine()
- Scroll the page to get more images
SP.Basic_Actions(engine=Hello_enigine,Obj_index=-2,Send_keys=False,rollpage=True)
- Get the attributes of the images
Attr=Hello_engine.Attributes('src',Hello_engine.Objects(Class='mimg'))
- Download Images
Hello_engine.Download(Attr,data_type='img')
- Close engine
Hello_engine.engine.quit()
Now you are able to see the images downloaded in 'downloads' file folder, because the network may not be good enough, some images may be crashed, it's just no problem.
To get more details, please see Document
2022-3-14
We go until we go wrong, then we keep on until we are right
For dream
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file SuperMiner-22.3.1.1.tar.gz
.
File metadata
- Download URL: SuperMiner-22.3.1.1.tar.gz
- Upload date:
- Size: 9.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 77e6f111ac9939c05eff4c6365c6d864b7fe27ef521548ac9573847078d2a435 |
|
MD5 | d385ce78fda6201716e0825ada696176 |
|
BLAKE2b-256 | d628c894c4c811c43a2fc413e54dcd611577823c2271456fc4c03fd185306fb4 |
File details
Details for the file SuperMiner-22.3.1.1-py3-none-any.whl
.
File metadata
- Download URL: SuperMiner-22.3.1.1-py3-none-any.whl
- Upload date:
- Size: 9.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 259036efd2c345e0eaa8d45538c99f5f0ae8a44de2a032592f5e74737772d365 |
|
MD5 | 742c41b180963865064280c45eb3108a |
|
BLAKE2b-256 | 1670db6a2c715cafa15f6fc68c25490da9af22f96a8d84d6f9f79852699220e4 |