Simplify web scraping
Project description
Scraple
Scraple is a Python library designed to simplify the process of web scraping, providing easy scraping and easy searching for selectors.
Version
v0.1.1 changelog
Installation
The package is hosted in Pypi and can be installed using pip:
pip install scraple
Main API
The package provides two main classes: Rules and SimpleExtractor.
1. Rules
The Rules class allows you to define rules of extraction.
You can pick selector just by knowing what string present in that page using the add_field_rule
method.
This method automatically searches for selector of element which text content match the string.
Additionally, the add_field_rule
method supports regular expression matching.
from scraple import Rules
#To instantiate Rules object you need to have the reference page.
some_rules = Rules("reference in the form of string path to local html file", "local")
some_rules.add_field_rule("a sentence or word exist in reference page", "field name 1")
some_rules.add_field_rule("some othe.*?text", "field name 2", re_flag=True)
# Add more field rules...
# It automatically search for the selector, to see it you can see the rule in console
# or by printing it
# print(rules)
2. SimpleExtractor
The SimpleExtractor class performs the actual scraping based on a defined rule.
A Rules object act as the "which to extract" and the SimpleExtractor do the "extract" or
scraping. First, pass a Rules object
to SimpleExtractor constructor and use the
perform_extraction
method to create a generator object that iterate dictionary of
elements extracted.
from scraple import SimpleExtractor
extractor = SimpleExtractor(some_rules) # some_rules from above code snippet
result = extractor.perform_extraction(
"web page in the form of beautifulSoup4 object",
"parsed"
)
# print(next(result))
# {
# "field name 1": [element, ...],
# "field name 2": ...,
# ...
# }
For more information and tutorial, see the documentation or visit the main repository
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scraple-0.1.1.tar.gz
.
File metadata
- Download URL: scraple-0.1.1.tar.gz
- Upload date:
- Size: 41.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4c2dc6538a1436a43e4bbd59edb300ea6b3599190a6a97e82e5209939a4a38a7 |
|
MD5 | 4a84297cfe45313f9e5e0ffd4ff76787 |
|
BLAKE2b-256 | 83a33c66c8e7ad76c630ed2c225c9e8b6fa00cefda1af5947d0cd9e9d351f691 |
File details
Details for the file scraple-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: scraple-0.1.1-py3-none-any.whl
- Upload date:
- Size: 10.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ef6eb0d678614e8d181453c1bf15d79e45c52054f42b1a6f6c67d481b771ce0 |
|
MD5 | 0643e2f17f2460da4446e6c79bdb6f26 |
|
BLAKE2b-256 | 0bb625042702af812f70518e652f91b76eda78ed3447c628a577dcaa5b3d1451 |