Skip to main content

Web miner built based on selenium but more simple operations

Project description

Introduction

This a project built for the SuperWebMiner, which is also a homework of my class. We can use this basic web miner frame to do some web miner works, such as downloading a large quantity of pictures etc. The goal of this project is to enable everyone to start his/her own super mine engine, and at the same time this project pushes me to comprised AI system closer. It would be great for you to give me suggestions on this project, all of us make it better and stronger!

Copyright

  • Author: Airscker
  • Last Released Time: 2022-4
  • Latest Edition: R22.2.0.0
  • Open source project. Copyright (C) Airscker, airscker@gmail.com, Mozilla Public License Version 2.0

Basic steps

Here we give you all the steps and references for build your first engine

Preparations

  • For Python Before you import code into our project, you need to download the whole zip file and then unfold it, enter the filefolder 'Codes' and open cmd here, then type in the command below and enter:
pip install -r requirements.txt
  • For Browser Now you need to install Chrome browser(this project only support chrome currently). Secondly get your chrome's edtion number in Settings. Then download chrome driver according to your edition number here. Move the webdriver.exe into the Scripts root path of python, such as: C:\Python\Python39\Scripts

Import

wait until all download threads executed,copy the file 'SuperMiner.py' and put it in the root of your project, then open your project, type in:

import SuperMiner as SP

Start your first engine

Here we show the basic steps to download Hello world images

  • Initialize your engine
Hello_engine=SP.SuperMiner(url='https://cn.bing.com/images/search?q=Hello+world')
  • Start miner engine
Hello_engine=SP.MineEngine()
  • Scroll the page to get more images
SP.Basic_Actions(engine=Hello_enigine,Obj_index=-2,Send_keys=False,rollpage=True)
  • Get the attributes of the images
Attr=Hello_engine.Attributes('src',Hello_engine.Objects(Class='mimg'))
  • Download Images
Hello_engine.Download(Attr,data_type='img')
  • Close engine
Hello_engine.engine.quit()

Now you are able to see the images downloaded in 'downloads' file folder, because the network may not be good enough, some images may be crashed, it's just no problem.

To get more details, please see Document

2022-3-14

We go until we go wrong, then we keep on until we are right

For dream

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SuperMiner-22.3.1.1.tar.gz (9.8 kB view details)

Uploaded Source

Built Distribution

SuperMiner-22.3.1.1-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file SuperMiner-22.3.1.1.tar.gz.

File metadata

  • Download URL: SuperMiner-22.3.1.1.tar.gz
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for SuperMiner-22.3.1.1.tar.gz
Algorithm Hash digest
SHA256 77e6f111ac9939c05eff4c6365c6d864b7fe27ef521548ac9573847078d2a435
MD5 d385ce78fda6201716e0825ada696176
BLAKE2b-256 d628c894c4c811c43a2fc413e54dcd611577823c2271456fc4c03fd185306fb4

See more details on using hashes here.

File details

Details for the file SuperMiner-22.3.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for SuperMiner-22.3.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 259036efd2c345e0eaa8d45538c99f5f0ae8a44de2a032592f5e74737772d365
MD5 742c41b180963865064280c45eb3108a
BLAKE2b-256 1670db6a2c715cafa15f6fc68c25490da9af22f96a8d84d6f9f79852699220e4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page