Parse Google, Bing, and DuckDuckGo HTML search results into JSON, Markdown, or dict — with auto-detection
Project description
search-parser
Parse Google, Bing, and DuckDuckGo HTML search results into JSON, Markdown, or Python dict — with automatic search engine detection.
search-parser takes raw HTML from Google, Bing, and DuckDuckGo — desktop or mobile — and extracts every result type — organic results, featured snippets, AI Overviews, People Also Ask, sponsored ads, shopping ads, and more — into clean, typed Python objects. It auto-detects the search engine from the HTML, so you never have to specify which parser to use.
Quick Start
from search_parser import SearchParser
parser = SearchParser()
html = open("google_results.html").read()
# JSON string (default)
json_output = parser.parse(html)
# Markdown string — great for feeding to an LLM
md_output = parser.parse(html, output_format="markdown")
# Python dict — for programmatic access
data = parser.parse(html, output_format="dict")
# Organic results are in data["results"]
for result in data["results"]:
print(f"{result['position']}. {result['title']}")
print(f" {result['url']}")
# Every other result type has its own dedicated key
if data["featured_snippet"]:
print("Featured:", data["featured_snippet"]["title"])
if data["ai_overview"]:
print("AI Overview:", data["ai_overview"]["description"][:100])
for question in data["people_also_ask"]:
print("PAA:", question["title"])
Installation
With uv (recommended):
uv add search-parser
With pip:
pip install search-parser
Supported Result Types
| Result Type | Field | Bing | DuckDuckGo | |
|---|---|---|---|---|
| Organic results | results |
✓ | ✓ | ✓ |
| Featured snippet | featured_snippet |
✓ | ✓ | — |
| Sponsored / ads | sponsored |
✓ | — | — |
| AI Overview | ai_overview |
✓ | — | — |
| People Also Ask | people_also_ask |
✓ | — | — |
| What People Are Saying | people_saying |
✓ | — | — |
| People Also Search For | people_also_search |
✓ | — | — |
| Related Products & Services | related_products |
✓ | — | — |
| Jobs | jobs |
✓ | — | — |
| Discussions and forums | discussions |
✓ | — | — |
| Shopping ads | shopping_ads |
✓ | — | — |
Working with Results
SearchParser.parse() with output_format="dict" returns the full SearchResults structure:
data = parser.parse(html, output_format="dict")
# Always a list (organic results only)
for r in data["results"]:
print(r["title"], r["url"], r["description"])
# None or a single object
if data["featured_snippet"]:
print(data["featured_snippet"]["title"])
# None or a single object with description + sources list
if data["ai_overview"]:
overview = data["ai_overview"]
print(overview["description"])
for source in overview["metadata"]["sources"]:
print(f" - {source['title']}: {source['url']}")
# Always a list (empty when not present)
for q in data["people_also_ask"]:
print(q["title"])
for post in data["people_saying"]:
print(post["title"], post["url"])
for item in data["people_also_search"]:
print(item["title"])
for ad in data["sponsored"]:
print(ad["title"], ad["url"])
for product in data["related_products"]:
print(product["title"])
# Jobs (title, metadata["company"], metadata["location"])
for job in data["jobs"]:
print(job["title"], job["metadata"]["company"], job["metadata"]["location"])
# Discussions (title, url, description, metadata["source"])
for disc in data["discussions"]:
print(disc["title"], disc["url"])
print(disc["metadata"]["source"])
# Shopping ads (title, metadata["price"], metadata["merchant"])
for ad in data["shopping_ads"]:
print(ad["title"], ad["metadata"]["price"], ad["metadata"]["merchant"])
# Metadata
print(data["search_engine"]) # "google"
print(data["query"]) # "python web scraping"
print(data["total_results"]) # 26200000 or None
print(data["detection_confidence"]) # 0.95
Using the model directly
When you need the typed SearchResults object instead of a dict, call the engine parser directly. The model exposes to_json() and to_markdown() convenience methods:
from search_parser.parsers.google import GoogleParser
parser = GoogleParser()
results = parser.parse(html) # returns SearchResults
# Typed access — no dict key lookups
print(results.query)
print(results.total_results)
print(len(results.results)) # organic count
if results.featured_snippet:
print(results.featured_snippet.title)
if results.ai_overview:
print(results.ai_overview.description)
sources = results.ai_overview.metadata["sources"]
for q in results.people_also_ask:
print(q.title)
for post in results.people_saying:
print(post.title, post.url)
for ad in results.shopping_ads:
print(ad.title, ad.metadata["price"], ad.metadata["merchant"])
# Convert to JSON or Markdown directly on the model
json_str = results.to_json()
json_str = results.to_json(indent=4) # custom indent
md_str = results.to_markdown()
Output Formats
JSON (output_format="json" or results.to_json())
{
"search_engine": "google",
"query": "python web scraping",
"total_results": 26200000,
"results": [
{
"title": "Web Scraping with Python - Real Python",
"url": "https://realpython.com/python-web-scraping/",
"description": "Learn how to scrape websites with Python...",
"position": 1,
"result_type": "organic",
"metadata": {}
}
],
"featured_snippet": null,
"ai_overview": {
"title": "AI Overview",
"url": "",
"description": "Python is a widely used language for web scraping...",
"position": 0,
"result_type": "ai_overview",
"metadata": {
"sources": [
{"title": "Beautiful Soup", "url": "https://www.crummy.com/software/BeautifulSoup/"},
{"title": "Requests", "url": "https://requests.readthedocs.io/"}
]
}
},
"people_also_ask": [
{"title": "Is Python good for web scraping?", "url": "", "position": 0, "result_type": "people_also_ask", "metadata": {}}
],
"sponsored": [],
"people_saying": [],
"people_also_search": [],
"related_products": [],
"jobs": [
{
"title": "Global Supply Chain Director",
"url": "https://www.google.com/search?q=%22Supply+Chain+Director&udm=8",
"description": null,
"position": 0,
"result_type": "job",
"metadata": {
"company": "InterSources, Inc.",
"location": "San Jose, CA • via Ladders"
}
}
],
"discussions": [
{
"title": "Being considered for Director of Supply Chain",
"url": "https://www.reddit.com/r/supplychain/comments/1ib0c1a/being_considered_for_director_of_supply_chain/",
"description": "I work for a mid-sized company as a Procurement Manager...",
"position": 0,
"result_type": "discussion",
"metadata": {
"source": "Reddit · r/supplychain · 10+ comments · 1 year ago"
}
}
],
"shopping_ads": [
{
"title": "ALCON - Precision7 , 12 Pack",
"url": "http://www.google.com/aclk?sa=L&ai=...",
"description": null,
"position": 0,
"result_type": "shopping_ad",
"metadata": {
"price": "$51.19",
"merchant": "Contacts Direct"
}
}
],
"detection_confidence": 0.95,
"parsed_at": "2026-02-21T00:00:00Z",
"metadata": {}
}
Markdown (output_format="markdown" or results.to_markdown())
# Search Results: python web scraping
**Search Engine:** Google
**Total Results:** ~26,200,000
**Parsed:** 2026-02-21 00:00:00 UTC
---
## Featured Snippet
### What is Web Scraping?
Web scraping is the process of extracting data from websites...
**Source:** [https://example.com](https://example.com)
---
## Organic Results
### 1. Web Scraping with Python - Real Python
Learn how to scrape websites with Python...
**URL:** https://realpython.com/python-web-scraping/
---
## Jobs
### Global Supply Chain Director
**Company:** InterSources, Inc.
**Location:** San Jose, CA • via Ladders
**URL:** https://www.google.com/search?q=%22Supply+Chain+Director&udm=8
---
## Discussions and Forums
### Being considered for Director of Supply Chain
I work for a mid-sized company as a Procurement Manager...
**URL:** https://www.reddit.com/r/supplychain/comments/1ib0c1a/being_considered_for_director_of_supply_chain/
## Shopping Ads
### ALCON - Precision7 , 12 Pack
**Price:** $51.19
**Merchant:** Contacts Direct
**URL:** http://www.google.com/aclk?sa=L&ai=...
CLI Usage
# Parse an HTML file (auto-detects search engine, outputs JSON)
search-parser parse results.html
# Markdown output
search-parser parse results.html --format markdown
# Specify engine manually
search-parser parse results.html --engine google --format json
# Read from stdin
cat results.html | search-parser parse - --format json
# Save to file
search-parser parse results.html --output results.json
Documentation
Full documentation: https://search-parser.github.io/search-parser/
Contributing
Contributions are welcome! Please read our Contributing Guide for details on the development workflow, how to add new parsers, and how to submit pull requests.
License
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file search_parser-0.5.0.tar.gz.
File metadata
- Download URL: search_parser-0.5.0.tar.gz
- Upload date:
- Size: 1.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
87a4bb3e9e8b958d545707c94a6ea7005e99684b02fabb0a1611c56eea0a82e5
|
|
| MD5 |
25abdd360ca15071e6907d29259b49fb
|
|
| BLAKE2b-256 |
11adf393a5c330fd63208bfe9e8de6a737f56e5e89c34071ed3d865d0c54af0c
|
Provenance
The following attestation bundles were made for search_parser-0.5.0.tar.gz:
Publisher:
publish.yml on getlinksc/search-parser
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
search_parser-0.5.0.tar.gz -
Subject digest:
87a4bb3e9e8b958d545707c94a6ea7005e99684b02fabb0a1611c56eea0a82e5 - Sigstore transparency entry: 1259360051
- Sigstore integration time:
-
Permalink:
getlinksc/search-parser@9f539ca61f88d39dccad29307c4a43bc92ec2f7f -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/getlinksc
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9f539ca61f88d39dccad29307c4a43bc92ec2f7f -
Trigger Event:
release
-
Statement type:
File details
Details for the file search_parser-0.5.0-py3-none-any.whl.
File metadata
- Download URL: search_parser-0.5.0-py3-none-any.whl
- Upload date:
- Size: 28.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0cceeb0cedbcd27931b74cfc09624e773ea2cf2f564c299160eca368964ba18b
|
|
| MD5 |
63e8399c2fbfe1f751b1e4d6e345a916
|
|
| BLAKE2b-256 |
b1b753139825b357aca4334b36e6f0554bd9ecffb1a540c4cedc690d7b0e6422
|
Provenance
The following attestation bundles were made for search_parser-0.5.0-py3-none-any.whl:
Publisher:
publish.yml on getlinksc/search-parser
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
search_parser-0.5.0-py3-none-any.whl -
Subject digest:
0cceeb0cedbcd27931b74cfc09624e773ea2cf2f564c299160eca368964ba18b - Sigstore transparency entry: 1259360264
- Sigstore integration time:
-
Permalink:
getlinksc/search-parser@9f539ca61f88d39dccad29307c4a43bc92ec2f7f -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/getlinksc
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9f539ca61f88d39dccad29307c4a43bc92ec2f7f -
Trigger Event:
release
-
Statement type: