Parse search engine HTML results into structured data
Project description
search-parser
Parse search engine HTML results into structured data (JSON, Markdown) with auto-detection.
search-parser takes raw HTML from popular search engines and extracts structured result data -- titles, URLs, snippets, and more -- into your preferred output format. It auto-detects the search engine from the HTML content, so you don't have to specify which parser to use.
Quick Start
from search_engine_parser import parse
html = open("google_results.html").read()
# JSON string
json_output = parse(html, output_format="json")
print(json_output)
# [{"title": "Example Result", "url": "https://example.com", "snippet": "An example result..."}, ...]
# Markdown string
md_output = parse(html, output_format="markdown")
print(md_output)
# ## Example Result
# **URL:** https://example.com
# An example result...
# Python list of dicts (default)
results = parse(html, output_format="dict")
for result in results:
print(result["title"], result["url"])
Installation
With uv (recommended):
uv add search-parser
With pip:
pip install search-parser
Supported Search Engines
| Search Engine | Auto-Detect | Status |
|---|---|---|
| Yes | Stable | |
| Bing | Yes | Stable |
| DuckDuckGo | Yes | Stable |
Each parser extracts the following fields (when available):
title-- The result headingurl-- The link to the result pagesnippet-- The text preview / descriptionposition-- The result's rank on the page
Output Formats
JSON
[
{
"position": 1,
"title": "Example Domain",
"url": "https://example.com",
"snippet": "This domain is for use in illustrative examples..."
},
{
"position": 2,
"title": "Another Result",
"url": "https://another.example.com",
"snippet": "Another example snippet text..."
}
]
Markdown
## 1. Example Domain
**URL:** https://example.com
This domain is for use in illustrative examples...
---
## 2. Another Result
**URL:** https://another.example.com
Another example snippet text...
Dict (Python)
[
{
"position": 1,
"title": "Example Domain",
"url": "https://example.com",
"snippet": "This domain is for use in illustrative examples...",
},
{
"position": 2,
"title": "Another Result",
"url": "https://another.example.com",
"snippet": "Another example snippet text...",
},
]
CLI Usage
search-parser includes a command-line interface for quick parsing:
# Parse an HTML file to JSON (auto-detects search engine)
search-parser parse results.html --format json
# Parse with explicit engine
search-parser parse results.html --engine google --format markdown
# Read from stdin
cat results.html | search-parser parse - --format json
# Output to a file
search-parser parse results.html --format json --output results.json
Documentation
Full documentation is available at https://search-parser.github.io/search-parser/.
Contributing
Contributions are welcome! Please read our Contributing Guide for details on the development workflow, how to add new parsers, and how to submit pull requests.
License
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file search_parser-0.0.1.tar.gz.
File metadata
- Download URL: search_parser-0.0.1.tar.gz
- Upload date:
- Size: 665.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32f8f408651d08c38d5f35127a3230fc33b0d86035107ad8397bdc73e4d57d6b
|
|
| MD5 |
1f18ba565e1703d4ef9315ecea4931c8
|
|
| BLAKE2b-256 |
8cac66f0d2022cb5806422884faac9d54c604968402669e7f7c838b819b11976
|
Provenance
The following attestation bundles were made for search_parser-0.0.1.tar.gz:
Publisher:
publish.yml on getlinksc/search-parser
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
search_parser-0.0.1.tar.gz -
Subject digest:
32f8f408651d08c38d5f35127a3230fc33b0d86035107ad8397bdc73e4d57d6b - Sigstore transparency entry: 975148789
- Sigstore integration time:
-
Permalink:
getlinksc/search-parser@9eae6bfdbd405cca714743ab28e538f3796af789 -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/getlinksc
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9eae6bfdbd405cca714743ab28e538f3796af789 -
Trigger Event:
release
-
Statement type:
File details
Details for the file search_parser-0.0.1-py3-none-any.whl.
File metadata
- Download URL: search_parser-0.0.1-py3-none-any.whl
- Upload date:
- Size: 22.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec20559be333aa8263fcd127f101fb1671ec2682e4e4001ad0bbb1bb9e25372f
|
|
| MD5 |
913dd4e6e4eca1f2c999a5b3904e1f3e
|
|
| BLAKE2b-256 |
8fcff5e57eb07b238e3e2d214245c15b1c7679270272744446e0bb269a0932ac
|
Provenance
The following attestation bundles were made for search_parser-0.0.1-py3-none-any.whl:
Publisher:
publish.yml on getlinksc/search-parser
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
search_parser-0.0.1-py3-none-any.whl -
Subject digest:
ec20559be333aa8263fcd127f101fb1671ec2682e4e4001ad0bbb1bb9e25372f - Sigstore transparency entry: 975148795
- Sigstore integration time:
-
Permalink:
getlinksc/search-parser@9eae6bfdbd405cca714743ab28e538f3796af789 -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/getlinksc
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9eae6bfdbd405cca714743ab28e538f3796af789 -
Trigger Event:
release
-
Statement type: