The GoogleIt Python package offers a versatile set of tools for querying Google search results, downloading content, preprocessing text, converting HTML to PDF, and leveraging Google Palm 2 and Gemini language models for natural language processing tasks.

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

GoogleIt Python Package

The GoogleIt package provides a set of tools for querying Google search results, retrieving URLs, downloading content, preprocessing text, extracting domain names from URLs, combining PDF files, and extracting relevant content based on cosine similarity.

Installation

pip install GoogleIt

Usage

from GoogleIt.googleit import GoogleIt

# Create an instance of the GoogleIt class
google_it = GoogleIt(api_key='your_api_key_here')

# Perform a query and retrieve information
query = "How does photosynthesis work?"
response = google_it.get(query=query, urls_count=5)
print(response)

Modules

converter.py Documentation - Provides functionality for converting HTML or websites to PDF.
models.py Documentation - Wrapper class for interacting with the Google Palm 2 language model.
text_processor.py Documentation - Functions for processing text documents, including converting PDF to DOCX, reading paragraphs from DOCX, dividing paragraphs into chunks, and extracting text and paragraphs from PDF.
googleit.py Documentation - Main module encapsulating the GoogleIt class, which provides functionality for querying, retrieving URLs, downloading content, preprocessing text, and more.

`converter.py` Documentation

GoogleIt Converter Module

This module provides functionality to convert HTML files or websites into PDF format using Selenium.

Usage:

- Import the module: `from GoogleIt import converter`
- Call the `convert` function with appropriate parameters.

Example:

converter.convert(source='https://example.com', target='output.pdf', timeout=5)

Functions:

- `convert(source: str, target: str, timeout: int = 2, print_options: dict = {}) -> None`:
    Converts a given HTML file or website into PDF.

    Parameters:
        - `source` (str): Source HTML file or website link.
        - `target` (str): Target location to save the PDF.
        - `timeout` (int, optional): Timeout in seconds. Default is set to 2 seconds.
        - `print_options` (dict, optional): Options for PDF printing. Refer to https://vanilla.aslushnikov.com/?Page.printToPDF for available options.

    Raises:
        - Exception: If an error occurs during PDF conversion.

Note:

This module relies on the Selenium library and requires a compatible WebDriver (e.g., ChromeDriver) to be installed.

`models.py` Documentation

This module provides wrapper classes for interacting with Google's language models, including Palm 2 and Gemini.

Palm2Model Class:

This class serves as a wrapper for the Google Palm 2 language model.

Attributes:

model: The initialized Palm 2 language model.

Methods:

init(self) -> None: Initializes the Palm2Model instance.
init(self, api_key: str) -> None: Initializes the Palm 2 language model using the provided API key.

Parameters:
- api_key (str): The API key for authentication.
make_prompt(self, query: str, relevant_passage: str) -> str: Generates a prompt for the Palm 2 language model.

Parameters:
- query (str): The user's question.
- relevant_passage (str): The relevant passage for context.
Returns: str: The formatted prompt for the language model.
redraft_response(self, query: str, response: str) -> str: Redrafts the response generated by the Palm 2 language model.

Parameters:
- query (str): The user's question.
- response (str): The generated response.
Returns: str: The redrafted response.
query(self, document: str, question: str) -> str: Queries the Palm 2 language model for an answer.

Parameters:
- document (str): The reference document for context.
- question (str): The user's question.
Returns: str: The generated answer from the language model.

GeminiModel Class:

This class serves as a wrapper for the Google Gemini language model.

Attributes:

model: The initialized Gemini language model.

Methods:

init(self) -> None: Initializes the GeminiModel instance.
init(self, api_key: str) -> None: Initializes the Gemini language model using the provided API key.

Parameters:
- api_key (str): The API key for authentication.
make_prompt(self, query: str, relevant_passage: str) -> str: Generates a prompt for the Gemini language model.

Parameters:
- query (str): The user's question.
- relevant_passage (str): The relevant passage for context.
Returns: str: The formatted prompt for the language model.
redraft_response(self, query: str, response: str) -> str: Redrafts the response generated by the Gemini language model.

Parameters:
- query (str): The user's question.
- response (str): The generated response.
Returns: str: The redrafted response.
query(self, document: str, question: str) -> str: Queries the Gemini language model for an answer.

Parameters:
- document (str): The reference document for context.
- question (str): The user's question.
Returns: str: The generated answer from the language model.

`text_processor.py` Documentation

GoogleIt Text Processor Module

This module provides functions for processing text documents, including converting PDF to DOCX, reading paragraphs from DOCX, dividing paragraphs into chunks, and extracting text and paragraphs from PDF.

Usage:

- Import the module: `from GoogleIt import text_processor`
- Use the provided functions for text processing tasks.

Example:

pdf_path = "path/to/input.pdf"
docx_path = "path/to/output.docx"

# Convert PDF to DOCX
text_processor.pdf_to_docx(pdf_file=pdf_path, docx_file=docx_path)

# Read paragraphs from DOCX
paragraphs = text_processor.read_document_paragraphs(filename=docx_path)

# Divide paragraphs into chunks
chunked_paragraphs = text_processor.get_chunks(paragraphs=paragraphs, chunk_size=10, overlap_size=2)

# Extract text and paragraphs from PDF
pdf_text, pdf_paragraphs = text_processor.extract_text_from_pdf(pdf_path=pdf_path, docx_path=docx_path)

Functions:

- `pdf_to_docx(pdf_file: str, docx_file: str) -> None`:
    Converts a PDF file to a DOCX file.

- `read_document_paragraphs(filename: str) -> List[str]`:
    Reads paragraphs from a document (DOCX file).

- `get_chunks(paragraphs: List[str], chunk_size: int = 10, overlap_size: int = 2) -> List[str]`:
    Divides a list of paragraphs into chunks.

- `extract_text_from_pdf(pdf_path: str, docx_path: str = "converted_document.docx") -> Tuple[str, List[str]]`:
    Extracts text and paragraphs from a PDF file.

    Returns a tuple containing the extracted text and a list of paragraphs.

Note:

- The `get_chunks` function requires passing the list of paragraphs to the function.
- The module includes an example at the end demonstrating the use of the `extract_text_from_pdf` function.

`googleit.py` Documentation

GoogleIt Module

This module provides the GoogleIt class, which encapsulates functionality for performing queries, retrieving top URLs from Google search results, downloading content from URLs, preprocessing text, extracting domain names from URLs, combining PDF files, and extracting relevant content based on cosine similarity.

Usage:

- Import the module: `from GoogleIt.googleit import GoogleIt`
- Create an instance of the `GoogleIt` class with a valid API key.
- Use the provided methods for various tasks.

Example:

google_it = GoogleIt(api_key='your_api_key_here', model = "Palm2")
query = "How does photosynthesis work?"
response = google_it.get(query=query, urls_count=5)
print(response)

Classes:

- `GoogleIt`:
    - A class that provides functionality for querying, retrieving URLs, downloading content, preprocessing text, and more.
    - Methods:
        - `__init__(self, api_key: str, model: str = "Palm2") -> None`: Initializes the `GoogleIt` instance with the provided API key and a specified language model.
        - `save_url_to_pdf(self, url: str, pdf_path: str) -> None`: Downloads content from a URL and saves it as a PDF file.
        - `preprocess_text(self, text: str) -> str`: Preprocesses text by converting it to lowercase, tokenizing, and removing stopwords and punctuation.
        - `get_domain_name(self, url: str) -> str`: Extracts the domain name from a given URL.
        - `get_top_urls(self, query: str, urls_count: int = 5) -> Tuple[list[str], list[str]]`: Retrieves top URLs from Google search results based on a given query.
        - `combine_pdf(self, folder_path: str) -> str`: Combines multiple PDF files into a single merged PDF.
        - `extract_relevant_content(self, input_text: str, main_document: str, threshold: float = 0.2) -> str`: Extracts relevant content from the input text based on cosine similarity.
        - `with_document(self, query: str, google_doc: str, pdf_path: str) -> str`: Processes a query using a provided PDF document and a Google document.
        - `without_document(self, query: str, paragraphs: list[str]) -> str`: Processes a query without a provided PDF document.
        - `get(self, query: str, pdf_path: str | None = None, urls_count: int = 5) -> str`: Main function to retrieve information based on a query, optionally using a PDF document.

Attributes:

- `model` (GoogleIt attribute): An instance of the model class for natural language processing.

Note:

This module requires the `Palm2Model` class and `GeminiModel` from the `models` module for natural language processing.

Note:

Replace 'your_api_key_here' with your actual Google API key. =======

GoogleIt Python Package

Installation

pip install GoogleIt

Usage

from GoogleIt.googleit import GoogleIt

# Create an instance of the GoogleIt class
google_it = GoogleIt(api_key='your_api_key_here')

# Perform a query and retrieve information
query = "How does photosynthesis work?"
response = google_it.get(query=query, urls_count=5)
print(response)

Modules

converter.py Documentation - Provides functionality for converting HTML or websites to PDF.
models.py Documentation - Wrapper class for interacting with the Google Palm 2 language model.
text_processor.py Documentation - Functions for processing text documents, including converting PDF to DOCX, reading paragraphs from DOCX, dividing paragraphs into chunks, and extracting text and paragraphs from PDF.
googleit.py Documentation - Main module encapsulating the GoogleIt class, which provides functionality for querying, retrieving URLs, downloading content, preprocessing text, and more.

`converter.py` Documentation

GoogleIt Converter Module

This module provides functionality to convert HTML files or websites into PDF format using Selenium.

Usage:

- Import the module: `from GoogleIt import converter`
- Call the `convert` function with appropriate parameters.

Example:

converter.convert(source='https://example.com', target='output.pdf', timeout=5)

Functions:

- `convert(source: str, target: str, timeout: int = 2, print_options: dict = {}) -> None`:
    Converts a given HTML file or website into PDF.

    Parameters:
        - `source` (str): Source HTML file or website link.
        - `target` (str): Target location to save the PDF.
        - `timeout` (int, optional): Timeout in seconds. Default is set to 2 seconds.
        - `print_options` (dict, optional): Options for PDF printing. Refer to https://vanilla.aslushnikov.com/?Page.printToPDF for available options.

    Raises:
        - Exception: If an error occurs during PDF conversion.

Note:

This module relies on the Selenium library and requires a compatible WebDriver (e.g., ChromeDriver) to be installed.

`models.py` Documentation

This module provides wrapper classes for interacting with Google's language models, including Palm 2 and Gemini.

Palm2Model Class:

This class serves as a wrapper for the Google Palm 2 language model.

Attributes:

model: The initialized Palm 2 language model.

Methods:

init(self) -> None: Initializes the Palm2Model instance.
init(self, api_key: str) -> None: Initializes the Palm 2 language model using the provided API key.

Parameters:
- api_key (str): The API key for authentication.
make_prompt(self, query: str, relevant_passage: str) -> str: Generates a prompt for the Palm 2 language model.

Parameters:
- query (str): The user's question.
- relevant_passage (str): The relevant passage for context.
Returns: str: The formatted prompt for the language model.
redraft_response(self, query: str, response: str) -> str: Redrafts the response generated by the Palm 2 language model.

Parameters:
- query (str): The user's question.
- response (str): The generated response.
Returns: str: The redrafted response.
query(self, document: str, question: str) -> str: Queries the Palm 2 language model for an answer.

Parameters:
- document (str): The reference document for context.
- question (str): The user's question.
Returns: str: The generated answer from the language model.

GeminiModel Class:

This class serves as a wrapper for the Google Gemini language model.

Attributes:

model: The initialized Gemini language model.

Methods:

init(self) -> None: Initializes the GeminiModel instance.
init(self, api_key: str) -> None: Initializes the Gemini language model using the provided API key.

Parameters:
- api_key (str): The API key for authentication.
make_prompt(self, query: str, relevant_passage: str) -> str: Generates a prompt for the Gemini language model.

Parameters:
- query (str): The user's question.
- relevant_passage (str): The relevant passage for context.
Returns: str: The formatted prompt for the language model.
redraft_response(self, query: str, response: str) -> str: Redrafts the response generated by the Gemini language model.

Parameters:
- query (str): The user's question.
- response (str): The generated response.
Returns: str: The redrafted response.
query(self, document: str, question: str) -> str: Queries the Gemini language model for an answer.

Parameters:
- document (str): The reference document for context.
- question (str): The user's question.
Returns: str: The generated answer from the language model.

`text_processor.py` Documentation

GoogleIt Text Processor Module

Usage:

- Import the module: `from GoogleIt import text_processor`
- Use the provided functions for text processing tasks.

Example:

pdf_path = "path/to/input.pdf"
docx_path = "path/to/output.docx"

# Convert PDF to DOCX
text_processor.pdf_to_docx(pdf_file=pdf_path, docx_file=docx_path)

# Read paragraphs from DOCX
paragraphs = text_processor.read_document_paragraphs(filename=docx_path)

# Divide paragraphs into chunks
chunked_paragraphs = text_processor.get_chunks(paragraphs=paragraphs, chunk_size=10, overlap_size=2)

# Extract text and paragraphs from PDF
pdf_text, pdf_paragraphs = text_processor.extract_text_from_pdf(pdf_path=pdf_path, docx_path=docx_path)

Functions:

- `pdf_to_docx(pdf_file: str, docx_file: str) -> None`:
    Converts a PDF file to a DOCX file.

- `read_document_paragraphs(filename: str) -> List[str]`:
    Reads paragraphs from a document (DOCX file).

- `get_chunks(paragraphs: List[str], chunk_size: int = 10, overlap_size: int = 2) -> List[str]`:
    Divides a list of paragraphs into chunks.

- `extract_text_from_pdf(pdf_path: str, docx_path: str = "converted_document.docx") -> Tuple[str, List[str]]`:
    Extracts text and paragraphs from a PDF file.

    Returns a tuple containing the extracted text and a list of paragraphs.

Note:

- The `get_chunks` function requires passing the list of paragraphs to the function.
- The module includes an example at the end demonstrating the use of the `extract_text_from_pdf` function.

`googleit.py` Documentation

GoogleIt Module

Usage:

- Import the module: `from GoogleIt.googleit import GoogleIt`
- Create an instance of the `GoogleIt` class with a valid API key.
- Use the provided methods for various tasks.

Example:

google_it = GoogleIt(api_key='your_api_key_here', model = "Palm2")
query = "How does photosynthesis work?"
response = google_it.get(query=query, urls_count=5)
print(response)

Classes:

- `GoogleIt`:
    - A class that provides functionality for querying, retrieving URLs, downloading content, preprocessing text, and more.
    - Methods:
        - `__init__(self, api_key: str, model: str = "Palm2") -> None`: Initializes the `GoogleIt` instance with the provided API key and a specified language model.
        - `save_url_to_pdf(self, url: str, pdf_path: str) -> None`: Downloads content from a URL and saves it as a PDF file.
        - `preprocess_text(self, text: str) -> str`: Preprocesses text by converting it to lowercase, tokenizing, and removing stopwords and punctuation.
        - `get_domain_name(self, url: str) -> str`: Extracts the domain name from a given URL.
        - `get_top_urls(self, query: str, urls_count: int = 5) -> Tuple[list[str], list[str]]`: Retrieves top URLs from Google search results based on a given query.
        - `combine_pdf(self, folder_path: str) -> str`: Combines multiple PDF files into a single merged PDF.
        - `extract_relevant_content(self, input_text: str, main_document: str, threshold: float = 0.2) -> str`: Extracts relevant content from the input text based on cosine similarity.
        - `with_document(self, query: str, google_doc: str, pdf_path: str) -> str`: Processes a query using a provided PDF document and a Google document.
        - `without_document(self, query: str, paragraphs: list[str]) -> str`: Processes a query without a provided PDF document.
        - `get(self, query: str, pdf_path: str | None = None, urls_count: int = 5) -> str`: Main function to retrieve information based on a query, optionally using a PDF document.

Attributes:

- `model` (GoogleIt attribute): An instance of the model class for natural language processing.

Note:

This module requires the `Palm2Model` class and `GeminiModel` from the `models` module for natural language processing.

Note:

Replace 'your_api_key_here' with your actual Google API key.

You can get the Google API key from https://makersuite.google.com/app/apikey.

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

2.0.2

Dec 23, 2023

0.1.1

Dec 23, 2023

0.1.0

Dec 23, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

GoogleIt-2.0.2.tar.gz (17.4 kB view details)

Uploaded Dec 23, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

GoogleIt-2.0.2-py3-none-any.whl (16.7 kB view details)

Uploaded Dec 23, 2023 Python 3

File details

Details for the file GoogleIt-2.0.2.tar.gz.

File metadata

Download URL: GoogleIt-2.0.2.tar.gz
Upload date: Dec 23, 2023
Size: 17.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for GoogleIt-2.0.2.tar.gz
Algorithm	Hash digest
SHA256	`282487ccd2fd4237c9389aa7f3bdb4d4319b5eb52394435002be87b020d0a3f9`
MD5	`e28d84c190218a9adab30bd5c82ac4dc`
BLAKE2b-256	`dd07de4615ee5130bd0bbc0260716f2c67ab1af0577b724636c8095bc52c047b`

See more details on using hashes here.

File details

Details for the file GoogleIt-2.0.2-py3-none-any.whl.

File metadata

Download URL: GoogleIt-2.0.2-py3-none-any.whl
Upload date: Dec 23, 2023
Size: 16.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for GoogleIt-2.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4850748c2a8a8d2598002114ac372a00ddfc18453cc889fdd8d122de5e0d3ef4`
MD5	`92e53edeced8736cef670f9b88b43a6e`
BLAKE2b-256	`5a458c3542ee3f848f65801e84ea44635e37ac4e82e8203b2ad7bb0df780509f`

See more details on using hashes here.

GoogleIt 2.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

GoogleIt Python Package

Installation

Usage

Modules

converter.py Documentation

Usage:

Example:

Functions:

Note:

models.py Documentation

Palm2Model Class:

Attributes:

Methods:

GeminiModel Class:

Attributes:

Methods:

text_processor.py Documentation

Usage:

Example:

Functions:

Note:

googleit.py Documentation

Usage:

Example:

Classes:

Attributes:

Note:

GoogleIt Python Package

Installation

Usage

Modules

converter.py Documentation

Usage:

Example:

Functions:

Note:

models.py Documentation

Palm2Model Class:

Attributes:

Methods:

GeminiModel Class:

Attributes:

Methods:

text_processor.py Documentation

Usage:

Example:

Functions:

Note:

googleit.py Documentation

Usage:

Example:

Classes:

Attributes:

Note:

You can get the Google API key from https://makersuite.google.com/app/apikey.

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

`converter.py` Documentation

`models.py` Documentation

`text_processor.py` Documentation

`googleit.py` Documentation

`converter.py` Documentation

`models.py` Documentation

`text_processor.py` Documentation

`googleit.py` Documentation