Skip to main content

PyScraping is a universal web-scraping util for Python, built with simplicity in mind.

Project description

PyScraping

PyScraping is a universal web-scraping util for Python, built with simplicity in mind.

Installation

Start to do the installation.

pip install pyscraping

Example

All scraping functionality can be accessed either as a function call or a property call. For example, the title can be accessed in two ways:

from pyscraping.PyScraping import PyScraping

pyscraping = PyScraping("https://google.com")

print(pyscraping.title())

Documentation

<head> tags

Get Website Title

Scraping the title from a website is simple.

pyscraping.title()

Get Meta Charset

To access the defined charset, you can use the following method:

pyscraping.charset()

Get Meta Viewport

In some cases, such as the viewport and the meta keywords, the string is representing an array and will be provided as such:

pyscraping.viewport()

If you need to access the original "viewport"-string, you can use viewportString:

pyscraping.viewportString()

Get Canonical URL

The canonical URL, if given, can be accessed as shown in the example below:

pyscraping.canonical()

Get Meta Content-Type

To access the content type you can use the following functionality:

pyscraping.contentType()

Get Meta CSFR Token

The CSFR token method assumes that the token is stored in a meta tag with the name "csrf-token". This is the default for Laravel. You can access it using the following code:

pyscraping.csrfToken()

Get Meta Author, Description and Image

The following example shows the extraction of three attributes:

  • the Meta Author,
  • the Meta Description and
  • the Meta Image URL
pyscraping.author()
pyscraping.description()
pyscraping.image()

Get Meta Keywords

The keywords meta-tag is naturally an array and will be split for your convenience:

pyscraping.keywords()

Alternatively, you can access the original keyword string:

pyscraping.keywordString()

Get Meta Open-Graph (OG) Data

Fetching open-graph data can be done:

  • og:site_name
  • og:type
  • og:title
  • og:description
  • og:url
  • og:image
# Example
pyscraping.openGraph("og:title")

# All
pyscraping.openGraph()

Get Meta Twitter Card

Parsing the Twitter Card works similarly:

  • twitter:card
  • twitter:title
  • twitter:description
  • twitter:url
  • twitter:image
# Example
pyscraping.twitterCard("twitter:title")

# All
pyscraping.twitterCard()

<body> tags

Get Headings by Level

There might be cases, in which all headings of a particular level should be retrieved. The example below shows how to do so:

pyscraping.h1()
pyscraping.h2()
pyscraping.h3()
pyscraping.h4()
pyscraping.h5()
pyscraping.h6()

Get all Paragraphs

The following example will return a list of all paragraphs (<p>-tags) on the website:

pyscraping.p()

Get Unordered Lists

The following example will return a list of all list (<ul>-tags) on the website:

pyscraping.ul()

Get Ordered Lists

The following example will return a list of all list (<ol>-tags) on the website:

pyscraping.ol()

Get all Image URLs

The following example parses a web-page for images and returns absolute image URLs as an array.

pyscraping.images()

Get all Images with Details

If you are in need of more details the following requests allows you to access attributes of the image tag:

pyscraping.imagesDetails()

Get all Link List

The following example parses a web-page for any links and returns an array of absolute URLs:

pyscraping.links()

Get all Links with Details

If you are in need of more details you can access these in a similar way as on the images. Below is an example to access the detailed data of the first link on the page:

pyscraping.linksDetails()

Custom xPath Selectors

The following examples of custom selectors should be seen as a starting point for any custom information you need to scrape.

pyscraping.filter(element, attribute)

Example

pyscraping.filter('div', 'class="container"')

Donate

Contact me

Contact me via email: rioagungpurnomo@programmer.net, I'm waiting for your input or suggestions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyscraping-1.0.5.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

pyscraping-1.0.5-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file pyscraping-1.0.5.tar.gz.

File metadata

  • Download URL: pyscraping-1.0.5.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.18

File hashes

Hashes for pyscraping-1.0.5.tar.gz
Algorithm Hash digest
SHA256 10ccbe12960cf180c3fb839848358c42e632d6e9c5642c7328e45beefa1982c8
MD5 90ddf0fad51432694c5836ac703c879d
BLAKE2b-256 c89d4da67b9ac94ba710496cd7e950d286f6aa94741d9439db6816baf5c750f6

See more details on using hashes here.

File details

Details for the file pyscraping-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: pyscraping-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.18

File hashes

Hashes for pyscraping-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 c9572f0c51ab7a98930f562bacd7077ffc02575aec06614f10d4461c589975b7
MD5 5d755db867e25cb028e81d9999555f15
BLAKE2b-256 b79deb043275989f60362617f808fcbc2831494203c4d3b857b84d5941198418

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page