harvester 0.5.2
pip install harvester
Released:
A lightweight static scraping library in pure Python
Navigation
Unverified details
These details have not been verified by PyPIProject links
Meta
- License: GNU General Public License v3 (GPLv3) (GPL-3.0-or-later)
- Author: Alberto Díaz Álvarez
- Requires: Python >=3.8
-
Provides-Extra:
dev
Classifiers
- Development Status
- Intended Audience
- License
- Operating System
- Programming Language
- Topic
Project description
Harvester: An easy-to-use Web Scraping tool.
Harvester is a lightweight, pure Python library designed for straightforward web scraping without external dependencies.
Features
- Pure Python: No third-party dependencies required.
Model
-Field
structure: Define scraping targets using a clear, class-based approach.- Flexible parsing: Use Python's standard libraries to parse and extract data.
Installation
Installing via pip:
pip install harvester
Or directly from the source code:
pip install git+https://github.com/blazaid/harvester
Requirements
Harvester is compatible with Python >= 3.8 versions. There are no mandatory external dependencies. However, for certain
features, the chardet
library may be beneficial. If chardet
is not installed, those features will be bypassed with a
warning.
Usage
Define your data models by subclassing Model
and specifying fields:
from harvester import Model, StringField, IntegerField
class Product(Model):
name = StringField()
price = IntegerField()
Parse the HTML content and extract data using the model:
from harvester import parse_html
html_content = """
<html>
<body>
<h1 class="product-name">Example Product</h1>
<span class="product-price">100</span>
</body>
</html>
"""
mapping = {
"name": "h1.product-name",
"price": "span.product-price"
}
product = parse_html(html_content, Product, mapping=mapping)
print(product.to_dict())
This will output:
{"name": "Example Product", "price": 100}
Documentation
Comprehensive documentation is forthcoming and will be available on Read the Docs. In the meantime, the source code is the best place to find information.
Contributing
Contributions are welcome! Please review the issues for current topics and feel free to submit pull requests. Also make sure to read the contributing guidelines to get started.
License
Harvester is licensed under the GNU General Public License v3.0. See the LICENSE file detailed information.
Project details
Unverified details
These details have not been verified by PyPIProject links
Meta
- License: GNU General Public License v3 (GPLv3) (GPL-3.0-or-later)
- Author: Alberto Díaz Álvarez
- Requires: Python >=3.8
-
Provides-Extra:
dev
Classifiers
- Development Status
- Intended Audience
- License
- Operating System
- Programming Language
- Topic
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file harvester-0.5.2.tar.gz
.
File metadata
- Download URL: harvester-0.5.2.tar.gz
- Upload date:
- Size: 55.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 22631342adc949784832a64b0e09d35f5092022f52b334ea9f5bb09a05b78eb1 |
|
MD5 | 3d7bb8daedd09f6df71f57a285c42c33 |
|
BLAKE2b-256 | cebc0f1728eea12ec54fe771fdddfb7d2a8fe18b604bce7befd21f5e89e4589e |
File details
Details for the file harvester-0.5.2-py3-none-any.whl
.
File metadata
- Download URL: harvester-0.5.2-py3-none-any.whl
- Upload date:
- Size: 45.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 855c6246c6d53758928677f9fd8f78423c03e762b2ee8ac286ca8baf5992e2d8 |
|
MD5 | ce39a3d5b8e75f100adb24d53ecc2f0c |
|
BLAKE2b-256 | 54c49521247054297fd50d6bc3cf15abbdf20bcf9ece23636656a7b855738556 |