A lightweight static scraping library in pure Python

These details have not been verified by PyPI

Project links

Project description

Harvester: An easy-to-use Web Scraping tool.

Harvester is a lightweight, pure Python library designed for straightforward web scraping without external dependencies.

Features

Pure Python: No third-party dependencies required.
Model-Field structure: Define scraping targets using a clear, class-based approach.
Flexible parsing: Use Python's standard libraries to parse and extract data.

Installation

Installing via pip:

pip install harvester

Or directly from the source code:

pip install git+https://github.com/blazaid/harvester

Requirements

Harvester is compatible with Python >= 3.8 versions. There are no mandatory external dependencies. However, for certain features, the chardet library may be beneficial. If chardet is not installed, those features will be bypassed with a warning.

Usage

Define your data models by subclassing Model and specifying fields:

from harvester import Model, StringField, IntegerField

class Product(Model):
    name = StringField()
    price = IntegerField()

Parse the HTML content and extract data using the model:

from harvester import parse_html

html_content = """
<html>
<body>
    <h1 class="product-name">Example Product</h1>
    <span class="product-price">100</span>
</body>
</html>
"""

mapping = {
    "name": "h1.product-name",
    "price": "span.product-price"
}

product = parse_html(html_content, Product, mapping=mapping)
print(product.to_dict())

This will output:

{"name": "Example Product", "price": 100}

Documentation

Comprehensive documentation is forthcoming and will be available on Read the Docs. In the meantime, the source code is the best place to find information.

Contributing

Contributions are welcome! Please review the issues for current topics and feel free to submit pull requests. Also make sure to read the contributing guidelines to get started.

License

Harvester is licensed under the GNU General Public License v3.0. See the LICENSE file detailed information.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.5.2

Jan 26, 2025

0.5.1

Jan 15, 2025

0.5.0

Jan 15, 2025

0.4.7

Jan 13, 2025

0.4.6

Jan 9, 2025

0.4.5

Jan 8, 2025

0.4.4

Jul 23, 2017

0.4.3

May 28, 2017

0.4.2

May 28, 2017

0.4.1

May 28, 2017

0.4.0

May 28, 2017

0.3.6

May 28, 2017

0.3.5

Jan 13, 2016

0.3.4

Jan 12, 2016

0.3.3

Jan 12, 2016

0.3.2

Oct 29, 2015

0.3.1

Sep 29, 2015

0.3.0

Sep 28, 2015

0.2.8

Sep 28, 2015

0.2.7

Sep 27, 2015

0.2.5

Sep 27, 2015

0.2.4

Sep 27, 2015

0.2.3

Sep 26, 2015

0.2.2

Sep 26, 2015

0.2.1

Sep 25, 2015

0.2.0

Sep 24, 2015

0.1.3

Sep 16, 2015

0.1.2

Sep 16, 2015

0.1.1

Aug 30, 2015

0.1.0

Aug 30, 2015

0.0.5

Aug 30, 2015

0.0.3

Aug 30, 2015

0.0.1

Aug 30, 2015

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

harvester-0.5.2.tar.gz (55.0 kB view details)

Uploaded Jan 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

harvester-0.5.2-py3-none-any.whl (45.1 kB view details)

Uploaded Jan 26, 2025 Python 3

File details

Details for the file harvester-0.5.2.tar.gz.

File metadata

Download URL: harvester-0.5.2.tar.gz
Upload date: Jan 26, 2025
Size: 55.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for harvester-0.5.2.tar.gz
Algorithm	Hash digest
SHA256	`22631342adc949784832a64b0e09d35f5092022f52b334ea9f5bb09a05b78eb1`
MD5	`3d7bb8daedd09f6df71f57a285c42c33`
BLAKE2b-256	`cebc0f1728eea12ec54fe771fdddfb7d2a8fe18b604bce7befd21f5e89e4589e`

See more details on using hashes here.

File details

Details for the file harvester-0.5.2-py3-none-any.whl.

File metadata

Download URL: harvester-0.5.2-py3-none-any.whl
Upload date: Jan 26, 2025
Size: 45.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for harvester-0.5.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`855c6246c6d53758928677f9fd8f78423c03e762b2ee8ac286ca8baf5992e2d8`
MD5	`ce39a3d5b8e75f100adb24d53ecc2f0c`
BLAKE2b-256	`54c49521247054297fd50d6bc3cf15abbdf20bcf9ece23636656a7b855738556`

See more details on using hashes here.

harvester 0.5.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Harvester: An easy-to-use Web Scraping tool.

Features

Installation

Requirements

Usage

Documentation

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes