PyScraping is a universal web-scraping util for Python, built with simplicity in mind.
Project description
PyScraping
PyScraping is a universal web-scraping util for Python, built with simplicity in mind.
Installation
Start to do the installation.
pip install pyscraping
Example
All scraping functionality can be accessed either as a function call or a property call. For example, the title can be accessed in two ways:
from pyscraping.PyScraping import PyScraping
pyscraping = PyScraping("https://google.com")
print(pyscraping.title())
Documentation
<head>
tags
Get Website Title
Scraping the title from a website is simple.
pyscraping.title()
Get Meta Charset
To access the defined charset, you can use the following method:
pyscraping.charset()
Get Meta Viewport
In some cases, such as the viewport and the meta keywords, the string is representing an array and will be provided as such:
pyscraping.viewport()
If you need to access the original "viewport"-string, you can use viewportString
:
pyscraping.viewportString()
Get Canonical URL
The canonical URL, if given, can be accessed as shown in the example below:
pyscraping.canonical()
Get Meta Content-Type
To access the content type you can use the following functionality:
pyscraping.contentType()
Get Meta CSFR Token
The CSFR token method assumes that the token is stored in a meta tag with the name "csrf-token". This is the default for Laravel. You can access it using the following code:
pyscraping.csrfToken()
Get Meta Author, Description and Image
The following example shows the extraction of three attributes:
- the Meta Author,
- the Meta Description and
- the Meta Image URL
pyscraping.author()
pyscraping.description()
pyscraping.image()
Get Meta Keywords
The keywords meta-tag is naturally an array and will be split for your convenience:
pyscraping.keywords()
Alternatively, you can access the original keyword string:
pyscraping.keywordString()
Get Meta Open-Graph (OG) Data
Fetching open-graph data can be done:
- og:site_name
- og:type
- og:title
- og:description
- og:url
- og:image
# Example
pyscraping.openGraph("og:title")
# All
pyscraping.openGraph()
Get Meta Twitter Card
Parsing the Twitter Card works similarly:
- twitter:card
- twitter:title
- twitter:description
- twitter:url
- twitter:image
# Example
pyscraping.twitterCard("twitter:title")
# All
pyscraping.twitterCard()
<body>
tags
Get Headings by Level
There might be cases, in which all headings of a particular level should be retrieved. The example below shows how to do so:
pyscraping.h1()
pyscraping.h2()
pyscraping.h3()
pyscraping.h4()
pyscraping.h5()
pyscraping.h6()
Get all Paragraphs
The following example will return a list of all paragraphs (<p>
-tags) on the website:
pyscraping.p()
Get Unordered Lists
The following example will return a list of all list (<ul>
-tags) on the website:
pyscraping.ul()
Get Ordered Lists
The following example will return a list of all list (<ol>
-tags) on the website:
pyscraping.ol()
Get all Image URLs
The following example parses a web-page for images and returns absolute image URLs as an array.
pyscraping.images()
Get all Images with Details
If you are in need of more details the following requests allows you to access attributes of the image tag:
pyscraping.imagesDetails()
Get all Link List
The following example parses a web-page for any links and returns an array of absolute URLs:
pyscraping.links()
Get all Links with Details
If you are in need of more details you can access these in a similar way as on the images. Below is an example to access the detailed data of the first link on the page:
pyscraping.linksDetails()
Custom xPath Selectors
The following examples of custom selectors should be seen as a starting point for any custom information you need to scrape.
pyscraping.filter(element, attribute)
Example
pyscraping.filter('div', 'class="container"')
Donate
Contact me
Contact me via email: rioagungpurnomo@programmer.net, I'm waiting for your input or suggestions.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pyscraping-1.0.5.tar.gz
.
File metadata
- Download URL: pyscraping-1.0.5.tar.gz
- Upload date:
- Size: 5.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 10ccbe12960cf180c3fb839848358c42e632d6e9c5642c7328e45beefa1982c8 |
|
MD5 | 90ddf0fad51432694c5836ac703c879d |
|
BLAKE2b-256 | c89d4da67b9ac94ba710496cd7e950d286f6aa94741d9439db6816baf5c750f6 |
File details
Details for the file pyscraping-1.0.5-py3-none-any.whl
.
File metadata
- Download URL: pyscraping-1.0.5-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c9572f0c51ab7a98930f562bacd7077ffc02575aec06614f10d4461c589975b7 |
|
MD5 | 5d755db867e25cb028e81d9999555f15 |
|
BLAKE2b-256 | b79deb043275989f60362617f808fcbc2831494203c4d3b857b84d5941198418 |