Skip to main content

A fast web scraper for Python

Project description

Scrapist: The Next Level of Efficient Web Scraping

Scrapist is a web scraper designed for Python. This web scraper uses requests and BeautifulSoup and also provides support for Scrapy style CSS selectors. Its features are:

  • Faster than requests and BeautifulSoup.
  • Is effective in fetching multiple pages compared to Scrapy.
  • Provides support for both BeautifulSoup-style selection and Scrapy-style CSS selection.

Installation

To install Scrapist, run this command in the terminal:

    pip install scrapist

Initialization

To start web scraping with Scrapist, use this code:

    from scrapist import Scraper

    scraper = Scraper()
    data = scraper.scrape("<your url here>")
    print(data.soup)

Getting Specified Parts/Tags of a web page

To get specified parts/tags of a web page, you can choose either of the two ways:

The Scrapy-style

To get specified data Scrapy-style, use this code after the initialization:

    first = data.css("<your css selector here>").get()
    print(first)
    # Or
    all_data = data.css("<your css selector here>").getall()
    print(all_data)

The BeautifulSoup-style

To get specified data BeautifulSoup-style, use this code after the initialization:

    first = data.find("<your tag here>", "[your attributes here]")
    print(first)
    # Or
    all_data = data.find_all("<your tag here>", "[your attributes here]")
    print(all_data)

Creating a Soup Strainer

To create a soup strainer, use this code just after the line of creating a scraper (The scraper = Scraper() intialization line):

    strainer = scraper.strainer("<your tag here>", "[your attribute here]")

And use the strainer in the strainer parameter in scraper.scrape() function.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapist-1.0.0.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

scrapist-1.0.0-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file scrapist-1.0.0.tar.gz.

File metadata

  • Download URL: scrapist-1.0.0.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.6

File hashes

Hashes for scrapist-1.0.0.tar.gz
Algorithm Hash digest
SHA256 32192186ebb78d11c7150f58983f27f4e052693ce0ec90e2fb748c28da2b8ac5
MD5 cc74d1db9cb1029b631326de0cb847e2
BLAKE2b-256 fb9c695aae5dbf048ad5bd478a2aa415d29e91e61aa290e51a0b0747cb8f834a

See more details on using hashes here.

File details

Details for the file scrapist-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: scrapist-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.6

File hashes

Hashes for scrapist-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 00ed5e6723d138cbe8d51f5d1dbc77d06479dcd1cb702525ed323b0766cb0088
MD5 18512ee8457dfeb8d7844311777cabca
BLAKE2b-256 94ac8cb3fafc520c9f2e4b14b919303190406b97898b5507568747ce92dfd546

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page