Skip to main content

Parse search engine HTML results into structured data

Project description

search-parser

PyPI Python Versions Tests Lint codecov Ruff License: Apache-2.0

Parse search engine HTML results into structured data (JSON, Markdown) with auto-detection.

search-parser takes raw HTML from popular search engines and extracts structured result data -- titles, URLs, snippets, and more -- into your preferred output format. It auto-detects the search engine from the HTML content, so you don't have to specify which parser to use.


Quick Start

from search_engine_parser import parse

html = open("google_results.html").read()

# JSON string
json_output = parse(html, output_format="json")
print(json_output)
# [{"title": "Example Result", "url": "https://example.com", "snippet": "An example result..."}, ...]

# Markdown string
md_output = parse(html, output_format="markdown")
print(md_output)
# ## Example Result
# **URL:** https://example.com
# An example result...

# Python list of dicts (default)
results = parse(html, output_format="dict")
for result in results:
    print(result["title"], result["url"])

Installation

With uv (recommended):

uv add search-parser

With pip:

pip install search-parser

Supported Search Engines

Search Engine Auto-Detect Status
Google Yes Stable
Bing Yes Stable
DuckDuckGo Yes Stable

Each parser extracts the following fields (when available):

  • title -- The result heading
  • url -- The link to the result page
  • snippet -- The text preview / description
  • position -- The result's rank on the page

Output Formats

JSON

[
  {
    "position": 1,
    "title": "Example Domain",
    "url": "https://example.com",
    "snippet": "This domain is for use in illustrative examples..."
  },
  {
    "position": 2,
    "title": "Another Result",
    "url": "https://another.example.com",
    "snippet": "Another example snippet text..."
  }
]

Markdown

## 1. Example Domain
**URL:** https://example.com
This domain is for use in illustrative examples...

---

## 2. Another Result
**URL:** https://another.example.com
Another example snippet text...

Dict (Python)

[
    {
        "position": 1,
        "title": "Example Domain",
        "url": "https://example.com",
        "snippet": "This domain is for use in illustrative examples...",
    },
    {
        "position": 2,
        "title": "Another Result",
        "url": "https://another.example.com",
        "snippet": "Another example snippet text...",
    },
]

CLI Usage

search-parser includes a command-line interface for quick parsing:

# Parse an HTML file to JSON (auto-detects search engine)
search-parser parse results.html --format json

# Parse with explicit engine
search-parser parse results.html --engine google --format markdown

# Read from stdin
cat results.html | search-parser parse - --format json

# Output to a file
search-parser parse results.html --format json --output results.json

Documentation

Full documentation is available at https://search-parser.github.io/search-parser/.


Contributing

Contributions are welcome! Please read our Contributing Guide for details on the development workflow, how to add new parsers, and how to submit pull requests.


License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

search_parser-0.0.1.tar.gz (665.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

search_parser-0.0.1-py3-none-any.whl (22.5 kB view details)

Uploaded Python 3

File details

Details for the file search_parser-0.0.1.tar.gz.

File metadata

  • Download URL: search_parser-0.0.1.tar.gz
  • Upload date:
  • Size: 665.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for search_parser-0.0.1.tar.gz
Algorithm Hash digest
SHA256 32f8f408651d08c38d5f35127a3230fc33b0d86035107ad8397bdc73e4d57d6b
MD5 1f18ba565e1703d4ef9315ecea4931c8
BLAKE2b-256 8cac66f0d2022cb5806422884faac9d54c604968402669e7f7c838b819b11976

See more details on using hashes here.

Provenance

The following attestation bundles were made for search_parser-0.0.1.tar.gz:

Publisher: publish.yml on getlinksc/search-parser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file search_parser-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: search_parser-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 22.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for search_parser-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ec20559be333aa8263fcd127f101fb1671ec2682e4e4001ad0bbb1bb9e25372f
MD5 913dd4e6e4eca1f2c999a5b3904e1f3e
BLAKE2b-256 8fcff5e57eb07b238e3e2d214245c15b1c7679270272744446e0bb269a0932ac

See more details on using hashes here.

Provenance

The following attestation bundles were made for search_parser-0.0.1-py3-none-any.whl:

Publisher: publish.yml on getlinksc/search-parser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page