A fast web scraper for Python
Project description
Scrapist: The Next Level of Efficient Web Scraping
Scrapist is a web scraper designed for Python. This web scraper uses requests and BeautifulSoup and also provides support for Scrapy style CSS selectors. Its features are:
- Faster than requests and BeautifulSoup.
- Is effective in fetching multiple pages compared to Scrapy.
- Provides support for both BeautifulSoup-style selection and Scrapy-style CSS selection.
Installation
To install Scrapist, run this command in the terminal:
pip install scrapist
Initialization
To start web scraping with Scrapist, use this code:
from scrapist import Scraper
scraper = Scraper()
data = scraper.scrape("<your url here>")
print(data.soup)
Getting Specified Parts/Tags of a web page
To get specified parts/tags of a web page, you can choose either of the two ways:
The Scrapy-style
To get specified data Scrapy-style, use this code after the initialization:
first = data.css("<your css selector here>").get()
print(first)
# Or
all_data = data.css("<your css selector here>").getall()
print(all_data)
The BeautifulSoup-style
To get specified data BeautifulSoup-style, use this code after the initialization:
first = data.find("<your tag here>", "[your attributes here]")
print(first)
# Or
all_data = data.find_all("<your tag here>", "[your attributes here]")
print(all_data)
Creating a Soup Strainer
To create a soup strainer, use this code just after the line of creating a scraper (The scraper = Scraper()
intialization line):
strainer = scraper.strainer("<your tag here>", "[your attribute here]")
And use the strainer in the strainer
parameter in scraper.scrape()
function.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scrapist-1.0.0.tar.gz
.
File metadata
- Download URL: scrapist-1.0.0.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 32192186ebb78d11c7150f58983f27f4e052693ce0ec90e2fb748c28da2b8ac5 |
|
MD5 | cc74d1db9cb1029b631326de0cb847e2 |
|
BLAKE2b-256 | fb9c695aae5dbf048ad5bd478a2aa415d29e91e61aa290e51a0b0747cb8f834a |
File details
Details for the file scrapist-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: scrapist-1.0.0-py3-none-any.whl
- Upload date:
- Size: 5.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00ed5e6723d138cbe8d51f5d1dbc77d06479dcd1cb702525ed323b0766cb0088 |
|
MD5 | 18512ee8457dfeb8d7844311777cabca |
|
BLAKE2b-256 | 94ac8cb3fafc520c9f2e4b14b919303190406b97898b5507568747ce92dfd546 |