Skip to main content

Parsing tool for massive parsing - stop writing many single parsers.

Project description

Parsify

Stop writing multiple parser scripts for parsing different websites. With Parsify you can have a single few lines script and the configuration file to fit your parser to different websites.

Contents

Installation

pip install parsify

Usage

Make sure you have your configuration file (usually handbook.json) ready.

import parsify as pf


# Create Parsify engine
ngn = pf.Engine(handbook='handbook.json')

# Run a single step
# Provide step name as an argument
# Should be in Engine.current_parser
# Should not have any "dynamic_variables" when custom using this method
# By default Engine.current_parser is the first parser in the Handbook
step_result = ngn.stepshot(step='get_products')
# print(step_result)

# Parse a single website (must be configured in "handbook.json")
# Provide scope name as an argument
scope_result = ngn.scopeshot(parser='example.com')
# print(scope_result)

# Run all the parsers that are configured in "handbook.json"
final_result = ngn.parse()
# print(final_result)

Handbook Tutorial

Required Fields

  • Handbook file should start with "parser" key value of which is the array of parsers.
  • Each parser in the array should have two keys:
    • "scope" - String: Name of the parser. Usually website name, i.e. "example.com".
    • "steps" - Array: Steps to parse.
  • Each step should have at least following fields:
    • "name" - String: Unique name of the step. This field will make possible to access this step's results and dynamic variables in the proceeding steps (if needed).
    • "chain_id" - Integer: Steps with the same chain id will be executed as a sequence of steps on every iteration.
    • "url" - String: Target url of the request(s) for the current step.
    • "method" - String: Request method for the current step.
    • "output_path" String: Path of the result data in response. Use dots if it's multi-nested, for example, if needed result is in response -> "data" -> "products", "output_path" should be "data.products".
    • "output" Dictionary:

License

Distributed under the MIT License. See LICENSE file for more information.

Contact

Luka Sosiashvili - @lukasanukvari - luksosiashvili@gmail.com

Project Link: https://github.com/lukasanukvari/parsify

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parsify-3.8.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

parsify-3.8-py3-none-any.whl (6.1 kB view details)

Uploaded Python 3

File details

Details for the file parsify-3.8.tar.gz.

File metadata

  • Download URL: parsify-3.8.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for parsify-3.8.tar.gz
Algorithm Hash digest
SHA256 3d2aa0476d182175aa688276234ae0b0109cb06a434627b2936213dec51c4651
MD5 0e164c84e843be941eea46506a970305
BLAKE2b-256 9511ade8dfbf910b6d3c8026ced484db65b3622c562a163cbef928cc000cf83d

See more details on using hashes here.

Provenance

File details

Details for the file parsify-3.8-py3-none-any.whl.

File metadata

  • Download URL: parsify-3.8-py3-none-any.whl
  • Upload date:
  • Size: 6.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for parsify-3.8-py3-none-any.whl
Algorithm Hash digest
SHA256 690f9c9028736c1a3598eafe1ebb4b26291f17bcfd278c5e27d135107184ccf8
MD5 3a9e6223e73c4987e93abce1a2010c97
BLAKE2b-256 424df475634015b576186144da001bed56be537a9f5182a0bf9568b472413b65

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page