Skip to main content

Parsing tool for massive parsing - stop writing many single parsers.

Project description

Parsify

Stop writing multiple parser scripts for parsing different websites. With Parsify you can have a single few lines script and the configuration file to fit your parser to different websites.

Contents

Installation

pip install parsify

Usage

Make sure you have your configuration file (usually handbook.json) ready.

import parsify as pf


# Create Parsify engine
ngn = pf.Engine(handbook='handbook.json')

# Run a single step
# Provide step name as an argument
# Should be in Engine.current_parser
# Should not have any "dynamic_variables" when custom using this method
# By default Engine.current_parser is the first parser in the Handbook
step_result = ngn.stepshot(step='get_products')
# print(step_result)

# Parse a single website (must be configured in "handbook.json")
# Provide scope name as an argument
scope_result = ngn.scopeshot(parser='example.com')
# print(scope_result)

# Run all the parsers that are configured in "handbook.json"
final_result = ngn.parse()
# print(final_result)

Handbook Tutorial

Required Fields

  • Handbook file should start with "parser" key value of which is the array of parsers.
  • Each parser in the array should have two keys:
    • "scope" - String: Name of the parser. Usually website name, i.e. "example.com".
    • "steps" - Array: Steps to parse.
  • Each step should have at least following fields:
    • "name" - String: Unique name of the step. This field will make possible to access this step's results and dynamic variables in the proceeding steps (if needed).
    • "chain_id" - Integer: Steps with the same chain id will be executed as a sequence of steps on every iteration.
    • "url" - String: Target url of the request(s) for the current step.
    • "method" - String: Request method for the current step.
    • "output_path" String: Path of the result data in response. Use dots if it's multi-nested, for example, if needed result is in response -> "data" -> "products", "output_path" should be "data.products".
    • "output" Dictionary:

License

Distributed under the MIT License. See LICENSE file for more information.

Contact

Luka Sosiashvili - @lukasanukvari - luksosiashvili@gmail.com

Project Link: https://github.com/lukasanukvari/parsify

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parsify-3.3.1.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

parsify-3.3.1-py3-none-any.whl (6.1 kB view details)

Uploaded Python 3

File details

Details for the file parsify-3.3.1.tar.gz.

File metadata

  • Download URL: parsify-3.3.1.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for parsify-3.3.1.tar.gz
Algorithm Hash digest
SHA256 54a084496164118583f18fb9311baee63e8b0f4a5974df83af2cd4e44794db86
MD5 e146c8abfb03b1be7dd3b351975985d2
BLAKE2b-256 c7c14078863667c2439347e56fe3f21b6fcca1f9bf8b63d5db927850aab1011c

See more details on using hashes here.

Provenance

File details

Details for the file parsify-3.3.1-py3-none-any.whl.

File metadata

  • Download URL: parsify-3.3.1-py3-none-any.whl
  • Upload date:
  • Size: 6.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for parsify-3.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fd06b4c723ddc6b5e365d528abac26291300ff927bdd69bf88464bf2742b3b19
MD5 20216346d626568861104ee340936811
BLAKE2b-256 7df0729a61cb755cac82cb8113a0cf4555d85940a2b67d5567d3a22df94ff464

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page