A lightweight python module which automates webscraping and parsing through HTML

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 5 - Production/Stable
Intended Audience
- Education
License
- OSI Approved :: MIT License
Operating System
- Microsoft :: Windows :: Windows 10
Programming Language
- Python :: 3

Project description

pyautoscraper

Author: Jeet Chugh

pyautoscraper is a A lightweight module which automates webscraping and gathering HTML elements within Python 3

Features:

Find elements by searching for tags, attributes, classes, id's, and more
Parse through Cloudflare protected sites (NOT CAPTCHA)
Install easily with pip
Lightweight, only uses cloudscraper and BS4

Github Link | PyPi Link | Example Code Link

Quick and Easy Installation via PIP: pip install pyautoscraper

Import Statement: from pyautoscraper.scraper import Scraper

Dependencies: bs4, cloudscraper

Code License: MIT

Documentation

Documentation is split into 2 sections. First is the 'Part' Class and second is the 'Query' Function.

'Scraper' Class:

The 'Scraper' class takes in an input of a URL as a string, and has many methods that return specific chunks of data.

Import:

from pyautoscraper.scraper import Scraper

Instantiation:

webscraper = Scraper('URL') # Takes in url string (with https://)

another_scraper = Scraper('Second URL') # Instantiate multiple Scrapers though variables

'Scraper' Methods:

Scraper will raise a URLerror if the request is unsuccessful. Scraper will return None if no elements are found.

Scraper('url').find(tag, **attributes) --> Scraper('URL').find('h1', class_='blog-title')

returns a string containing the first HTML element that matches your parameters. To find classes, use the 'class_' keyword argument.

(<h1 class="blog-title>Title</h1>")

Scraper('url').findAll(tag, **attributes) --> Scraper('URL').findAll('p')

returns a list of strings, containing all the HTML elements that match the parameters.

([<p>first</p>, <p>second</p>, <p>third</p>])

Scraper('url').findText()

returns a string containing the text content of the HTML, with all tags and attributes stripped.

(h1 text paragraph text span text h5 text im in a div tag)

Scraper('url').findLinks()

returns a list of all http/https links in a tags within the HTML code of the page.

([https://www.google.com, https://www.github.com])

Scraper('url').findJS()

returns a list containing strings, which represent the string tags within the HTML code.

Example Dictionary:{'model':'Intel','Core Clock':'3.2Ghz','TDP':'95W','Socket':'LGA1155'}

Scraper('url').findElementByID(IDname)

returns a string containing the first HTML element that matches your IDname.

(<div id="database_div">content</div>)

Scraper('url').findElementByClass(className)

returns a string containing the first HTML element that matches your className.

(<div class="database_div">content</div>)

Scraper('url').findComments()

returns a list of strings, containing all the HTML comments within the code.

(['',''])

Thank you for reading the documentation. If you need an example using all these methods, go to [link]

If you have issues, report them to the github project link.

CHANGELOG:

0.0.1 (10/5/20):

GitHub Commit
Published to PyPi

0.0.2 (10/6/20):

Updated README
Fixed Bug

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 5 - Production/Stable
Intended Audience
- Education
License
- OSI Approved :: MIT License
Operating System
- Microsoft :: Windows :: Windows 10
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.0.2

Oct 6, 2020

0.0.1

Oct 6, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyautoscraper-0.0.2.tar.gz (4.9 kB view details)

Uploaded Oct 6, 2020 Source

File details

Details for the file pyautoscraper-0.0.2.tar.gz.

File metadata

Download URL: pyautoscraper-0.0.2.tar.gz
Upload date: Oct 6, 2020
Size: 4.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.8.5

File hashes

Hashes for pyautoscraper-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`0fa2ff3e056ee25a03f8cc069750b393366b4536d19a9352786c677b5891a86f`
MD5	`232e1fdcdbff3cf47d6ac05ad5d1a47a`
BLAKE2b-256	`3766d8b4faf4875b4c16139ed9c817abda2c72ab15740d75b015a2192c1d7782`

See more details on using hashes here.

pyautoscraper 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pyautoscraper

Author: Jeet Chugh

pyautoscraper is a A lightweight module which automates webscraping and gathering HTML elements within Python 3

Features:

Github Link | PyPi Link | Example Code Link

Code License: MIT

Documentation

Documentation is split into 2 sections. First is the 'Part' Class and second is the 'Query' Function.

'Scraper' Class:

'Scraper' Methods:

Scraper will raise a URLerror if the request is unsuccessful. Scraper will return None if no elements are found.

CHANGELOG:

0.0.1 (10/5/20):

0.0.2 (10/6/20):

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes