Skip to main content

Xcrap Parser is a declarative, model-driven parser for extracting data from HTML and JSON files, with the ability to interleave both to extract even more information.

Project description

Xcrap Parser

Xcrap Parser is a declarative, model-driven parser for extracting data from HTML and JSON files, with the ability to interleave both to extract even more information.

It is inspired by the parser embedded in the Xcrap Framework available for Node.js. It was built using Parsel for HTML parsing and JMESPath for JSON parsing.

Installation

pip install xcrap-parser

Simple Usage

from xcrap_parser import HtmlParsingModel

html = "<html><title>Title</title><body><h1>Heading</h1></body></html>"

root_parsing_model = HtmlParsingModel({
    "title": {
        "query": "title::text"
    },
    "heading": {
        "query": "h1::text"
    }
})

data = root_parsing_model.parse(html)

print(data)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xcrap_parser-0.1.1.tar.gz (2.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xcrap_parser-0.1.1-py3-none-any.whl (4.8 kB view details)

Uploaded Python 3

File details

Details for the file xcrap_parser-0.1.1.tar.gz.

File metadata

  • Download URL: xcrap_parser-0.1.1.tar.gz
  • Upload date:
  • Size: 2.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.2 Windows/10

File hashes

Hashes for xcrap_parser-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a72a120914054f31e494b6852314fd8507c0ee3595a80816565eaf78336e9acd
MD5 f10a115b243c86783f736173ec1f4e53
BLAKE2b-256 b583392d85491696504a296d874d42661884f91e7041b4b0ee739b3b42ed93a7

See more details on using hashes here.

File details

Details for the file xcrap_parser-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: xcrap_parser-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 4.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.2 Windows/10

File hashes

Hashes for xcrap_parser-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 412805eebd452ecc5d44418bcd8facd57e33830550b32511d75f161eadbe8e92
MD5 60c20a4f6d96ad223614723ce7c8e81b
BLAKE2b-256 4baf2ab6fe18dde51af3993b49a06097d28f54523013ca8482c72b899ecb1fa9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page