Simplify web scraping

These details have not been verified by PyPI

Project links

Project description

Scraple

Scraple is a Python library designed to simplify the process of web scraping, providing easy scraping and easy searching for selectors.

Version

v0.1.1 changelog

Installation

The package is hosted in Pypi and can be installed using pip:

pip install scraple

Main API

The package provides two main classes: Rules and SimpleExtractor.

1. Rules

The Rules class allows you to define rules of extraction. You can pick selector just by knowing what string present in that page using the add_field_rule method. This method automatically searches for selector of element which text content match the string. Additionally, the add_field_rule method supports regular expression matching.

from scraple import Rules

#To instantiate Rules object you need to have the reference page.
some_rules = Rules("reference in the form of string path to local html file", "local")
some_rules.add_field_rule("a sentence or word exist in reference page", "field name 1")
some_rules.add_field_rule("some othe.*?text", "field name 2", re_flag=True)
# Add more field rules...

# It automatically search for the selector, to see it you can see the rule in console
# or by printing it
# print(rules)

2. SimpleExtractor

The SimpleExtractor class performs the actual scraping based on a defined rule. A Rules object act as the "which to extract" and the SimpleExtractor do the "extract" or scraping. First, pass a Rules object to SimpleExtractor constructor and use the perform_extraction method to create a generator object that iterate dictionary of elements extracted.

from scraple import SimpleExtractor

extractor = SimpleExtractor(some_rules)  # some_rules from above code snippet
result = extractor.perform_extraction(
    "web page in the form of beautifulSoup4 object",
    "parsed"
)

# print(next(result))
# {
#   "field name 1": [element, ...],
#   "field name 2": ...,
#   ...
# }

For more information and tutorial, see the documentation or visit the main repository

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Jun 19, 2023

0.1.0

Jun 16, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scraple-0.1.1.tar.gz (41.8 kB view details)

Uploaded Jun 19, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scraple-0.1.1-py3-none-any.whl (10.0 kB view details)

Uploaded Jun 19, 2023 Python 3

File details

Details for the file scraple-0.1.1.tar.gz.

File metadata

Download URL: scraple-0.1.1.tar.gz
Upload date: Jun 19, 2023
Size: 41.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.0

File hashes

Hashes for scraple-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`4c2dc6538a1436a43e4bbd59edb300ea6b3599190a6a97e82e5209939a4a38a7`
MD5	`4a84297cfe45313f9e5e0ffd4ff76787`
BLAKE2b-256	`83a33c66c8e7ad76c630ed2c225c9e8b6fa00cefda1af5947d0cd9e9d351f691`

See more details on using hashes here.

File details

Details for the file scraple-0.1.1-py3-none-any.whl.

File metadata

Download URL: scraple-0.1.1-py3-none-any.whl
Upload date: Jun 19, 2023
Size: 10.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.0

File hashes

Hashes for scraple-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9ef6eb0d678614e8d181453c1bf15d79e45c52054f42b1a6f6c67d481b771ce0`
MD5	`0643e2f17f2460da4446e6c79bdb6f26`
BLAKE2b-256	`0bb625042702af812f70518e652f91b76eda78ed3447c628a577dcaa5b3d1451`

See more details on using hashes here.

scraple 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Scraple

Version

Installation

Main API

1. Rules

2. SimpleExtractor

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes