Parsing tool for massive parsing - stop writing many single parsers.
Project description
Parsify
Stop writing multiple parser scripts for parsing different websites. With Parsify you can have a single few lines script and the configuration file to fit your parser to different websites.
Contents
Installation
pip install parsify
Usage
Make sure you have your configuration file (usually handbook.json
) ready.
import parsify as pf
# Create Parsify engine
ngn = pf.Engine(handbook='handbook.json')
# Run a single step
# Provide step name as an argument
# Should be in Engine.current_parser
# Should not have any "dynamic_variables" when custom using this method
# By default Engine.current_parser is the first parser in the Handbook
step_result = ngn.stepshot(step='get_products')
# print(step_result)
# Parse a single website (must be configured in "handbook.json")
# Provide scope name as an argument
scope_result = ngn.scopeshot(parser='example.com')
# print(scope_result)
# Run all the parsers that are configured in "handbook.json"
final_result = ngn.parse()
# print(final_result)
Handbook Tutorial
Required Fields
- Handbook file should start with "parser" key value of which is the array of parsers.
- Each parser in the array should have two keys:
- "scope" - String: Name of the parser. Usually website name, i.e. "example.com".
- "steps" - Array: Steps to parse.
- Each step should have at least following fields:
- "name" - String: Unique name of the step. This field will make possible to access this step's results and dynamic variables in the proceeding steps (if needed).
- "chain_id" - Integer: Steps with the same chain id will be executed as a sequence of steps on every iteration.
- "url" - String: Target url of the request(s) for the current step.
- "method" - String: Request method for the current step.
- "output_path" String: Path of the result data in response. Use dots if it's multi-nested, for example, if needed result is in response -> "data" -> "products", "output_path" should be "data.products".
- "output" Dictionary:
License
Distributed under the MIT License. See LICENSE
file for more information.
Contact
Luka Sosiashvili - @lukasanukvari - luksosiashvili@gmail.com
Project Link: https://github.com/lukasanukvari/parsify
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file parsify-3.8.tar.gz
.
File metadata
- Download URL: parsify-3.8.tar.gz
- Upload date:
- Size: 5.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d2aa0476d182175aa688276234ae0b0109cb06a434627b2936213dec51c4651 |
|
MD5 | 0e164c84e843be941eea46506a970305 |
|
BLAKE2b-256 | 9511ade8dfbf910b6d3c8026ced484db65b3622c562a163cbef928cc000cf83d |
Provenance
File details
Details for the file parsify-3.8-py3-none-any.whl
.
File metadata
- Download URL: parsify-3.8-py3-none-any.whl
- Upload date:
- Size: 6.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 690f9c9028736c1a3598eafe1ebb4b26291f17bcfd278c5e27d135107184ccf8 |
|
MD5 | 3a9e6223e73c4987e93abce1a2010c97 |
|
BLAKE2b-256 | 424df475634015b576186144da001bed56be537a9f5182a0bf9568b472413b65 |