Web miner built based on selenium but more simple operations
Project description
Introduction
This a project built for the SuperWebMiner, which is also a homework of my class. We can use this basic web miner frame to do some web miner works, such as downloading a large quantity of pictures etc. The goal of this project is to enable everyone to start his/her own super mine engine, and at the same time this project pushes me to comprised AI system closer. It would be great for you to give me suggestions on this project, all of us make it better and stronger!
Copyright
- Author: Airscker
- Last Edited Time: 2022-7
- Latest Edition: 22.3.1.2
- Open source project. Copyright (C) Airscker, airscker@gmail.com, Mozilla Public License Version 2.0
Basic steps of coding on IDE
Here we give you all the steps and references for build your first engine
Preparations
-
For Python
Before you import code into our project, you need to download the project in this way:
pip install SuperMiner
- For Browser
- Now you need to install Chrome browser(this project only support chrome currently).
- Secondly get your chrome's edtion number in Settings(tab 'About Chrome', such as 100.0.4896.88).
- Then download chrome driver according to your edition number here.
- Move the webdriver.exe into the Scripts root path of python, such as: C:\Python\Python39\Scripts\
Import
wait until all download threads executed, then open your project, type in:
from SuperMiner import SMiner as SM
Start your first engine
Here we show the basic steps to download Hello world images
- Initialize your engine
Hello_engine=SM.SuperMiner(url='https://cn.bing.com/images/search?q=Hello+world')
- Start miner engine
Hello_engine.MineEngine()
- Scroll the page to get more images
SM.Basic_Actions(engine=Hello_engine.engine,Obj_index=-2,send_keys=False,rollpage=True)
- Get the attributes of the images
Attr=Hello_engine.Attributes('src',Hello_engine.Objects(Class='mimg'))
- Download Images
Hello_engine.Download(Attr,data_type='img')
- Close engine
Hello_engine.engine.quit()
Now you are able to see the images downloaded in 'downloads' file folder, because the network may not be good enough, some images may be crashed, it's just no problem.
To get more details, please see Document, and command support is added since edition 22.2.0.0(R22.2.0.0), to get more details please see Command Support
2022-3-14
We go until we go wrong, then we keep on until we are right
For dream
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file SuperMiner-22.3.1.2.tar.gz
.
File metadata
- Download URL: SuperMiner-22.3.1.2.tar.gz
- Upload date:
- Size: 9.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e5feb7ed0c2d6c7ee5298cb3f31a674614559d14580f83c161ed7e05501ba16 |
|
MD5 | fe98c8634a204b3cc08ddee99a4db83d |
|
BLAKE2b-256 | e1110fc1078f16a9227354009351fa07b1875987d09b638939a19b9491bd4ecc |
File details
Details for the file SuperMiner-22.3.1.2-py3-none-any.whl
.
File metadata
- Download URL: SuperMiner-22.3.1.2-py3-none-any.whl
- Upload date:
- Size: 9.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b619c024d7642a14701da83e7bd27a62d18e47c8347c10f123f78c276e48f3b0 |
|
MD5 | 8b185dc364ec766174ba577c579bb3a4 |
|
BLAKE2b-256 | 58ac5ba1d228e35678b6d64e258bc1b380aaf64328eeb92a1fb7831ae1eb1ece |