Skip to main content

A comprehensive asynchronous library for scraping and parsing Search Engine Results Pages (SERPs).

Project description

PySerp [Internal Project]

PySerp is an asynchronous Python library for automated, flexibly configurable scraping and parsing of Search Engine Results Pages (SERPs).

Notice: this project is currently for internal development and is not intended for public distribution.

Purpose

Scraping search engine results is almost always necessary when the task is to automatically analyze these results, or to collect content from the links within them for any purpose.

Examples:

  • Competitive analysis for keywords (SEO)
  • Searching for and extracting any structured information from page content (phone numbers, emails, addresses, etc)
  • Collecting page content to generate summaries (AI search)

Key Features

This library:

  • Is asynchronous by default for maximum efficiency
  • Supports Google and Bing as search engines (more will be added in the future)
  • Applies strict typing to results using Pydantic

Installation

Download the source code:

git clone https://github.com/whode/pyserp

Create a virtual environment (recommended):

python -m venv venv

And activate it

On Windows:

venv\Scripts\activate

On Linux:

source venv/bin/activate

Install the library:

pip install -e .

Usage

A simple, idiomatic usage example that demonstrates retrieving the top 10 search results from Google for a given query:

import asyncio

from pyserp.providers import GoogleSearcherManager, GoogleSearchSessionsManager


async def main():
    query = "how to learn python"
    print("Searching for:", query, end="\n\n")

    cookies = {"NID": "YOUR_NID_COOKIE (Get it in your browser: F12 -> Application -> Cookies)"}
    manager = GoogleSearchSessionsManager(cookies = cookies)
    async with GoogleSearcherManager(search_sessions_manager=manager) as searcher:
        search_top_result = await searcher.search_top(query=query,
                                                      limit=10,
                                                      include_page_errors=False)

        print("----- Results -----", end="\n\n")
        for page in search_top_result.pages:
            for result in page.results.organic:
                print(result.title, result.url, sep="\n", end="\n\n")


if __name__ == "__main__":
    asyncio.run(main())

The library offers much more than this. Full documentation will be added in the future.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyserp-1.0.0.tar.gz (23.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyserp-1.0.0-py3-none-any.whl (35.7 kB view details)

Uploaded Python 3

File details

Details for the file pyserp-1.0.0.tar.gz.

File metadata

  • Download URL: pyserp-1.0.0.tar.gz
  • Upload date:
  • Size: 23.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for pyserp-1.0.0.tar.gz
Algorithm Hash digest
SHA256 4bbb9a55e57b7919b952cb5f1af5b9849dbae8e9d08aeaa8edbd29ddd70c4a49
MD5 aee97e8eb82d20d833225438d6328b43
BLAKE2b-256 514134bdf7ba67e4d1a5ac72e6974ba9b37ff7c4d0ffc7560e5e7fb2a0ff9adb

See more details on using hashes here.

File details

Details for the file pyserp-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: pyserp-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 35.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for pyserp-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9f629a56e4048bcbc3d9546fefcfa55cdb1e7a14aa47c63705b754d582955985
MD5 f7a9c151172421904b218f7d8990907d
BLAKE2b-256 435789671f40aab916d8f6f1dbe81db4784dc1c10270173c9101ec9099e4fb2f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page