Skip to main content

Web miner built based on selenium but more simple operations

Project description

Introduction

This a project built for the SuperWebMiner, which is also a homework of my class. We can use this basic web miner frame to do some web miner works, such as downloading a large quantity of pictures etc. The goal of this project is to enable everyone to start his/her own super mine engine, and at the same time this project pushes me to comprised AI system closer. It would be great for you to give me suggestions on this project, all of us make it better and stronger!

Copyright

  • Author: Airscker
  • Last Edited Time: 2022-7
  • Latest Edition: 22.3.1.2
  • Open source project. Copyright (C) Airscker, airscker@gmail.com, Mozilla Public License Version 2.0

Basic steps of coding on IDE

Here we give you all the steps and references for build your first engine

Preparations

  • For Python

    Before you import code into our project, you need to download the project in this way:

pip install SuperMiner
  • For Browser
    • Now you need to install Chrome browser(this project only support chrome currently).
    • Secondly get your chrome's edtion number in Settings(tab 'About Chrome', such as 100.0.4896.88).
    • Then download chrome driver according to your edition number here.
    • Move the webdriver.exe into the Scripts root path of python, such as: C:\Python\Python39\Scripts\

Import

wait until all download threads executed, then open your project, type in:

from SuperMiner import SMiner as SM

Start your first engine

Here we show the basic steps to download Hello world images

  • Initialize your engine
Hello_engine=SM.SuperMiner(url='https://cn.bing.com/images/search?q=Hello+world')
  • Start miner engine
Hello_engine.MineEngine()
  • Scroll the page to get more images
SM.Basic_Actions(engine=Hello_engine.engine,Obj_index=-2,send_keys=False,rollpage=True)
  • Get the attributes of the images
Attr=Hello_engine.Attributes('src',Hello_engine.Objects(Class='mimg'))
  • Download Images
Hello_engine.Download(Attr,data_type='img')
  • Close engine
Hello_engine.engine.quit()

Now you are able to see the images downloaded in 'downloads' file folder, because the network may not be good enough, some images may be crashed, it's just no problem.

To get more details, please see Document, and command support is added since edition 22.2.0.0(R22.2.0.0), to get more details please see Command Support

2022-3-14

We go until we go wrong, then we keep on until we are right

For dream

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SuperMiner-22.3.1.2.tar.gz (9.8 kB view details)

Uploaded Source

Built Distribution

SuperMiner-22.3.1.2-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file SuperMiner-22.3.1.2.tar.gz.

File metadata

  • Download URL: SuperMiner-22.3.1.2.tar.gz
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for SuperMiner-22.3.1.2.tar.gz
Algorithm Hash digest
SHA256 7e5feb7ed0c2d6c7ee5298cb3f31a674614559d14580f83c161ed7e05501ba16
MD5 fe98c8634a204b3cc08ddee99a4db83d
BLAKE2b-256 e1110fc1078f16a9227354009351fa07b1875987d09b638939a19b9491bd4ecc

See more details on using hashes here.

File details

Details for the file SuperMiner-22.3.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for SuperMiner-22.3.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b619c024d7642a14701da83e7bd27a62d18e47c8347c10f123f78c276e48f3b0
MD5 8b185dc364ec766174ba577c579bb3a4
BLAKE2b-256 58ac5ba1d228e35678b6d64e258bc1b380aaf64328eeb92a1fb7831ae1eb1ece

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page