Skip to main content

Parse Google, Bing, and DuckDuckGo HTML search results into JSON, Markdown, or dict — with auto-detection

Project description

search-parser

PyPI Python Versions Tests Lint codecov Ruff License: Apache-2.0

Parse Google, Bing, and DuckDuckGo HTML search results into JSON, Markdown, or Python dict — with automatic search engine detection.

search-parser takes raw HTML from Google, Bing, and DuckDuckGo and extracts every result type — organic results, featured snippets, AI Overviews, People Also Ask, sponsored ads, and more — into clean, typed Python objects. It auto-detects the search engine from the HTML, so you never have to specify which parser to use.


Quick Start

from search_parser import SearchParser

parser = SearchParser()
html = open("google_results.html").read()

# JSON string (default)
json_output = parser.parse(html)

# Markdown string — great for feeding to an LLM
md_output = parser.parse(html, output_format="markdown")

# Python dict — for programmatic access
data = parser.parse(html, output_format="dict")

# Organic results are in data["results"]
for result in data["results"]:
    print(f"{result['position']}. {result['title']}")
    print(f"   {result['url']}")

# Every other result type has its own dedicated key
if data["featured_snippet"]:
    print("Featured:", data["featured_snippet"]["title"])

if data["ai_overview"]:
    print("AI Overview:", data["ai_overview"]["description"][:100])

for question in data["people_also_ask"]:
    print("PAA:", question["title"])

Installation

With uv (recommended):

uv add search-parser

With pip:

pip install search-parser

Supported Result Types

Result Type Field Google Bing DuckDuckGo
Organic results results
Featured snippet featured_snippet
Sponsored / ads sponsored
AI Overview ai_overview
People Also Ask people_also_ask
What People Are Saying people_saying
People Also Search For people_also_search
Related Products & Services related_products
Jobs jobs
Discussions and forums discussions

Working with Results

SearchParser.parse() with output_format="dict" returns the full SearchResults structure:

data = parser.parse(html, output_format="dict")

# Always a list (organic results only)
for r in data["results"]:
    print(r["title"], r["url"], r["description"])

# None or a single object
if data["featured_snippet"]:
    print(data["featured_snippet"]["title"])

# None or a single object with description + sources list
if data["ai_overview"]:
    overview = data["ai_overview"]
    print(overview["description"])
    for source in overview["metadata"]["sources"]:
        print(f"  - {source['title']}: {source['url']}")

# Always a list (empty when not present)
for q in data["people_also_ask"]:
    print(q["title"])

for post in data["people_saying"]:
    print(post["title"], post["url"])

for item in data["people_also_search"]:
    print(item["title"])

for ad in data["sponsored"]:
    print(ad["title"], ad["url"])

for product in data["related_products"]:
    print(product["title"])

# Jobs (title, metadata["company"], metadata["location"])
for job in data["jobs"]:
    print(job["title"], job["metadata"]["company"], job["metadata"]["location"])

# Discussions (title, url, description, metadata["source"])
for disc in data["discussions"]:
    print(disc["title"], disc["url"])
    print(disc["metadata"]["source"])

# Metadata
print(data["search_engine"])        # "google"
print(data["query"])                # "python web scraping"
print(data["total_results"])        # 26200000 or None
print(data["detection_confidence"]) # 0.95

Using the model directly

When you need the typed SearchResults object instead of a dict, call the engine parser directly. The model exposes to_json() and to_markdown() convenience methods:

from search_parser.parsers.google import GoogleParser

parser = GoogleParser()
results = parser.parse(html)  # returns SearchResults

# Typed access — no dict key lookups
print(results.query)
print(results.total_results)
print(len(results.results))  # organic count

if results.featured_snippet:
    print(results.featured_snippet.title)

if results.ai_overview:
    print(results.ai_overview.description)
    sources = results.ai_overview.metadata["sources"]

for q in results.people_also_ask:
    print(q.title)

for post in results.people_saying:
    print(post.title, post.url)

# Convert to JSON or Markdown directly on the model
json_str = results.to_json()
json_str = results.to_json(indent=4)  # custom indent
md_str = results.to_markdown()

Output Formats

JSON (output_format="json" or results.to_json())

{
  "search_engine": "google",
  "query": "python web scraping",
  "total_results": 26200000,
  "results": [
    {
      "title": "Web Scraping with Python - Real Python",
      "url": "https://realpython.com/python-web-scraping/",
      "description": "Learn how to scrape websites with Python...",
      "position": 1,
      "result_type": "organic",
      "metadata": {}
    }
  ],
  "featured_snippet": null,
  "ai_overview": {
    "title": "AI Overview",
    "url": "",
    "description": "Python is a widely used language for web scraping...",
    "position": 0,
    "result_type": "ai_overview",
    "metadata": {
      "sources": [
        {"title": "Beautiful Soup", "url": "https://www.crummy.com/software/BeautifulSoup/"},
        {"title": "Requests", "url": "https://requests.readthedocs.io/"}
      ]
    }
  },
  "people_also_ask": [
    {"title": "Is Python good for web scraping?", "url": "", "position": 0, "result_type": "people_also_ask", "metadata": {}}
  ],
  "sponsored": [],
  "people_saying": [],
  "people_also_search": [],
  "related_products": [],
  "jobs": [
    {
      "title": "Global Supply Chain Director",
      "url": "https://www.google.com/search?q=%22Supply+Chain+Director&udm=8",
      "description": null,
      "position": 0,
      "result_type": "job",
      "metadata": {
        "company": "InterSources, Inc.",
        "location": "San Jose, CA  •  via Ladders"
      }
    }
  ],
  "discussions": [
    {
      "title": "Being considered for Director of Supply Chain",
      "url": "https://www.reddit.com/r/supplychain/comments/1ib0c1a/being_considered_for_director_of_supply_chain/",
      "description": "I work for a mid-sized company as a Procurement Manager...",
      "position": 0,
      "result_type": "discussion",
      "metadata": {
        "source": "Reddit · r/supplychain · 10+ comments · 1 year ago"
      }
    }
  ],
  "detection_confidence": 0.95,
  "parsed_at": "2026-02-21T00:00:00Z",
  "metadata": {}
}

Markdown (output_format="markdown" or results.to_markdown())

# Search Results: python web scraping

**Search Engine:** Google
**Total Results:** ~26,200,000
**Parsed:** 2026-02-21 00:00:00 UTC

---

## Featured Snippet

### What is Web Scraping?
Web scraping is the process of extracting data from websites...

**Source:** [https://example.com](https://example.com)

---

## Organic Results

### 1. Web Scraping with Python - Real Python
Learn how to scrape websites with Python...

**URL:** https://realpython.com/python-web-scraping/

---

## Jobs

### Global Supply Chain Director

**Company:** InterSources, Inc.
**Location:** San Jose, CA  •  via Ladders
**URL:** https://www.google.com/search?q=%22Supply+Chain+Director&udm=8

---

## Discussions and Forums

### Being considered for Director of Supply Chain

I work for a mid-sized company as a Procurement Manager...

**URL:** https://www.reddit.com/r/supplychain/comments/1ib0c1a/being_considered_for_director_of_supply_chain/

CLI Usage

# Parse an HTML file (auto-detects search engine, outputs JSON)
search-parser parse results.html

# Markdown output
search-parser parse results.html --format markdown

# Specify engine manually
search-parser parse results.html --engine google --format json

# Read from stdin
cat results.html | search-parser parse - --format json

# Save to file
search-parser parse results.html --output results.json

Documentation

Full documentation: https://search-parser.github.io/search-parser/


Contributing

Contributions are welcome! Please read our Contributing Guide for details on the development workflow, how to add new parsers, and how to submit pull requests.


License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

search_parser-0.4.0.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

search_parser-0.4.0-py3-none-any.whl (26.7 kB view details)

Uploaded Python 3

File details

Details for the file search_parser-0.4.0.tar.gz.

File metadata

  • Download URL: search_parser-0.4.0.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for search_parser-0.4.0.tar.gz
Algorithm Hash digest
SHA256 608134ca91c4458259e5bcf8d74cad24ca4f74f694641e418bab11c832ea91ad
MD5 4e257a92c28719dc8e28bac4a950da4c
BLAKE2b-256 e6886e0b02031c38a7cf4492033e9a629fbfe1d9889984f94de51dc43bd3271f

See more details on using hashes here.

Provenance

The following attestation bundles were made for search_parser-0.4.0.tar.gz:

Publisher: publish.yml on getlinksc/search-parser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file search_parser-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: search_parser-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 26.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for search_parser-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 595fec73fc2970d9e372032acecbea2f954fed4974f711c4eb30110338aa4213
MD5 7166de2e3e37db6c878b8bed3e9294fa
BLAKE2b-256 a723490c4c195d324894ce648963ceecf37bceb5b45259b97802c6f7623be693

See more details on using hashes here.

Provenance

The following attestation bundles were made for search_parser-0.4.0-py3-none-any.whl:

Publisher: publish.yml on getlinksc/search-parser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page