Skip to main content

perse converts HTML content into structured JSON data

Project description

Perse

PyPI version

Perse

Perse converts HTML to JSON using a mix of traditional html parsing and LLM based data extraction.

Installation

pip install zf-perse

Usage

export PERSE_OPENAI_API_KEY="your-openai-api-key"
from perse import perse

url = "https://example.com"
html = requests.get(url).text
j = perse(html)
print(j)

Example

Input

<!-- taken from https://zeffmuks.com -->

Output

{
    "title": "Zeff Muks",
    "description": "Antifragile Entropy Assassin 🥷",
    "og": {
        "type": "website",
        "title": "Zeff Muks",
        "description": "Antifragile Entropy Assassin 🥷",
        "url": "https://www.zeffmuks.com/",
        "image": "https://www.zeffmuks.com/images/ZeffMuks-1920.png",
        "site_name": "Zeff Muks",
    },
    "twitter": {
        "card": "summary_large_image",
        "site": "@zeffmuks",
        "title": "Zeff Muks",
        "description": "Antifragile Entropy Assassin 🥷",
        "image": "https://www.zeffmuks.com/images/ZeffMuks-1920.png",
    },
    "main_header": "Antifragile Entropy Assassin 🥷🏻",
    "header_link": "https://x.com/zeffmuks",
    "builds": [
        {
            "date": "08/30/2024",
            "project": {
                "name": "Cursor Git",
                "description": "Enhanced Git for Cursor AI Editor",
                "logo_url": "https://zf-static.s3.us-west-1.amazonaws.com/cursor-git-logo128.png",
                "download_link": "https://zf-static.s3.us-west-1.amazonaws.com/cursor-git-0.1.12.vsix",
                "external_link": "",
            },
        },
        {
            "date": "08/18/2024",
            "project": {
                "name": "PyZF",
                "description": "Enhancements for Python",
                "logo_url": "https://zf-static.s3.us-west-1.amazonaws.com/pyzf-logo128.png",
                "download_link": "",
                "external_link": "https://pypi.org/project/PyZF",
            },
        },
        {
            "date": "08/05/2024",
            "project": {
                "name": "Xanthus",
                "description": "X (formerly Twitter) Assistant",
                "logo_url": "https://zf-static.s3.us-west-1.amazonaws.com/xanthus-logo128.png",
                "download_link": "",
                "external_link": "https://pypi.org/project/zf-xanthus",
            },
        },
        {
            "date": "07/24/2024",
            "project": {
                "name": "Jenga",
                "description": "Fast JSON5 Python Library",
                "logo_url": "",
                "download_link": "https://pypi.org/project/zf-jenga",
                "external_link": "",
            },
        },
        ...

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zf-perse-0.1.1.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

zf_perse-0.1.1-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file zf-perse-0.1.1.tar.gz.

File metadata

  • Download URL: zf-perse-0.1.1.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.9

File hashes

Hashes for zf-perse-0.1.1.tar.gz
Algorithm Hash digest
SHA256 dd0d92a6fd609503ea35af3feaf13fbc8bfa321097dfaaeb43b0999607c9662c
MD5 b31623a04b9272a435c997c33326372b
BLAKE2b-256 68b0066e1e31bd49be9a80f6937505418ff132b22d54f62e9bb6ee5242177bea

See more details on using hashes here.

File details

Details for the file zf_perse-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: zf_perse-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.9

File hashes

Hashes for zf_perse-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3bcb2070bf5c41468dc286430ed2e2da5de0ed425a439a53a9cd4c508db0b428
MD5 3ac54b9a1bba71ef86756943a97d319e
BLAKE2b-256 8dc4c4c9df58f32f37c51e020e60a872bcca2f0678efbf712bcabc6e01b15aff

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page