Parsing google results

Project description

xl

Parsing google results

To install: pip install xl

Overview

The xl package is designed for parsing Google search results HTML pages. It provides a comprehensive set of tools to extract various pieces of information from Google search results, such as the number of results, ads, organic results, related searches, and more. The package utilizes BeautifulSoup for parsing HTML content and provides a structured output in the form of dictionaries that can be easily used for further data analysis or processing.

Main Features

Parsing Google Search Results: Extract detailed information from Google search result pages.
Support for Multiple Result Types: Handles organic results, ads, related searches, and other components found in Google search pages.
Output as Structured Data: Converts HTML content into structured dictionaries, making it easier to handle and analyze.
Utility Functions: Includes several utility functions to assist with tasks such as file handling and URL processing.

Installation

To install the package, run the following command:

pip install xl

Usage Examples

Parsing a Google Search Result HTML File

To parse a Google search result from an HTML file and extract information:

from xl import mk_gresult_tag_dict, parse_tag_dict

# Path to your HTML file
file_path = 'path_to_your_google_search_result.html'

# Create a tag dictionary from the HTML file
tag_dict = mk_gresult_tag_dict(file_path)

# Parse the tag dictionary to get a structured information dictionary
info_dict = parse_tag_dict(tag_dict)

# Print the extracted information
print(info_dict)

Extracting Domain Lists from Google Results

To get a list of domains from the parsed Google search results:

from xl import get_domain_list_from_google_results

# Assuming `info_dict` is already obtained from previous steps
domain_list = get_domain_list_from_google_results(info_dict)

# Print the list of domains
print(domain_list)

Function Documentation

`mk_gresult_tag_dict(input)`

Takes a BeautifulSoup object, HTML string, or filename of a Google result HTML and returns a dictionary containing key components of interest from the HTML.

`parse_tag_dict(tag_dict)`

Takes a dictionary (generated by mk_gresult_tag_dict) and parses it to extract structured information such as the number of results, lists of ads, and organic results.

`get_domain_list_from_google_results(gresults)`

Accepts either a BeautifulSoup object, HTML content, or a filename and returns a list of domains extracted from the Google search results.

Contributing

Contributions to the xl package are welcome. Please ensure that your code adheres to the existing style and that all tests pass before submitting a pull request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

Release history Release notifications | RSS feed

This version

0.0.5

Jun 11, 2025

0.0.4

Oct 10, 2022

0.0.3

Oct 4, 2022

0.0.2

Jan 6, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xl-0.0.5.tar.gz (12.5 kB view details)

Uploaded Jun 11, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

xl-0.0.5-py3-none-any.whl (11.6 kB view details)

Uploaded Jun 11, 2025 Python 3

File details

Details for the file xl-0.0.5.tar.gz.

File metadata

Download URL: xl-0.0.5.tar.gz
Upload date: Jun 11, 2025
Size: 12.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for xl-0.0.5.tar.gz
Algorithm	Hash digest
SHA256	`616569d76d38ba2db703e3a82d470153845d83c458f11bc06b4161e8d621a045`
MD5	`a1fc0e55a2345f2cb6bc3656d4dfd639`
BLAKE2b-256	`bb816e70e66558149faeeec70fd976d6f2de155ac088bcfcc4f5719c2f29bec9`

See more details on using hashes here.

File details

Details for the file xl-0.0.5-py3-none-any.whl.

File metadata

Download URL: xl-0.0.5-py3-none-any.whl
Upload date: Jun 11, 2025
Size: 11.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for xl-0.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`68b433308199a45286f1452037bb95e5e577f6053cd2476cc081e04f417f1ac9`
MD5	`8ab0740ee241b694afbce4fbb8e64edc`
BLAKE2b-256	`07142fd4aeffdfbbf951895c2ab622edbff881c4d482c4358482ff8790e3a941`

See more details on using hashes here.

xl 0.0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

xl

Overview

Main Features

Installation

Usage Examples

Parsing a Google Search Result HTML File

Extracting Domain Lists from Google Results

Function Documentation

`mk_gresult_tag_dict(input)`

`parse_tag_dict(tag_dict)`

`get_domain_list_from_google_results(gresults)`

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes