Skip to main content

Simplify web scraping

Project description

Scraple

Scraple is a Python library designed to simplify the process of web scraping, providing easy scraping and easy searching for selectors.

Version

v0.1.1 changelog

Installation

The package is hosted in Pypi and can be installed using pip:

pip install scraple

Main API

The package provides two main classes: Rules and SimpleExtractor.

1. Rules

The Rules class allows you to define rules of extraction. You can pick selector just by knowing what string present in that page using the add_field_rule method. This method automatically searches for selector of element which text content match the string. Additionally, the add_field_rule method supports regular expression matching.

from scraple import Rules

#To instantiate Rules object you need to have the reference page.
some_rules = Rules("reference in the form of string path to local html file", "local")
some_rules.add_field_rule("a sentence or word exist in reference page", "field name 1")
some_rules.add_field_rule("some othe.*?text", "field name 2", re_flag=True)
# Add more field rules...

# It automatically search for the selector, to see it you can see the rule in console
# or by printing it
# print(rules)

2. SimpleExtractor

The SimpleExtractor class performs the actual scraping based on a defined rule. A Rules object act as the "which to extract" and the SimpleExtractor do the "extract" or scraping. First, pass a Rules object to SimpleExtractor constructor and use the perform_extraction method to create a generator object that iterate dictionary of elements extracted.

from scraple import SimpleExtractor

extractor = SimpleExtractor(some_rules)  # some_rules from above code snippet
result = extractor.perform_extraction(
    "web page in the form of beautifulSoup4 object",
    "parsed"
)

# print(next(result))
# {
#   "field name 1": [element, ...],
#   "field name 2": ...,
#   ...
# }

For more information and tutorial, see the documentation or visit the main repository

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scraple-0.1.1.tar.gz (41.8 kB view details)

Uploaded Source

Built Distribution

scraple-0.1.1-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file scraple-0.1.1.tar.gz.

File metadata

  • Download URL: scraple-0.1.1.tar.gz
  • Upload date:
  • Size: 41.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.0

File hashes

Hashes for scraple-0.1.1.tar.gz
Algorithm Hash digest
SHA256 4c2dc6538a1436a43e4bbd59edb300ea6b3599190a6a97e82e5209939a4a38a7
MD5 4a84297cfe45313f9e5e0ffd4ff76787
BLAKE2b-256 83a33c66c8e7ad76c630ed2c225c9e8b6fa00cefda1af5947d0cd9e9d351f691

See more details on using hashes here.

File details

Details for the file scraple-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: scraple-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.0

File hashes

Hashes for scraple-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9ef6eb0d678614e8d181453c1bf15d79e45c52054f42b1a6f6c67d481b771ce0
MD5 0643e2f17f2460da4446e6c79bdb6f26
BLAKE2b-256 0bb625042702af812f70518e652f91b76eda78ed3447c628a577dcaa5b3d1451

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page