Tool that uses Perplexity and OpenAI to search with SERP and filter for relevant URLs.

These details have not been verified by PyPI

Project links

repository

Project description

ai_url_aggregator

Note: This is a small experimental library, provided as-is.

ai_url_aggregator is a Python tool that leverages Perplexity and OpenAI to search the internet for relevant URLs, filter and deduplicate them, check their availability, and then select the most important ones based on GPT analysis.

Features

Search Across Models
Uses Perplexity’s sonar-reasoning model to query the internet for URLs related to your prompt.
Clean & Filter
- Prefers https:// links when both http:// and https:// are found for the same domain.
- Removes duplicates by collecting results into a set.
Online Check
- Verifies each URL’s availability (status codes 200 or 403) using requests.
Relevance Ranking
- Uses an OpenAI model to select the most important websites from the deduplicated list of online URLs.

DeepWiki Docs: https://deepwiki.com/carlosplanchon/ai_url_aggregator

Installation

1. Install via PyPI

uv add ai_url_aggregator

2. Set Environment Variables

You must provide your Perplexity and OpenAI API keys:

export PERPLEXITY_API_KEY="PERPLEXITY_API_KEY"
export OPENAI_API_KEY="OPENAI_API_KEY"

Replace "PERPLEXITY_API_KEY" and "OPENAI_API_KEY" with your actual API keys.

3. (Optional) Install from Source

Clone or Download this repository.
Install Dependencies:
```
uv sync
```
This ensures all required libraries (like openai, requests, etc.) are installed.

How It Works

query_models(query: str) -> list[str]
- Sends a query to Perplexity’s sonar-reasoning model.
- Parses the Perplexity output with an OpenAI model into a structured list of URLs.
keep_https(urls: list[str]) -> list[str]
- Selects https:// versions of URLs when duplicates exist, else keeps http://.
execute_query_multiple_times(query: str, num_runs: int) -> list[str]
- Runs the query multiple times to gather more URLs.
- Deduplicates results using a set.
check_urls_online(urls: list[str]) -> list[str]
- Pings each URL to see if it’s reachable (status 200 or 403).
search_for_web_urls(query: str, num_runs: int) -> list[str]
- Brings all the above together:
  1. Executes a query multiple times.
  2. Prefers HTTPS versions of each domain.
  3. Verifies URL reachability.
  4. Returns a final list of online, deduplicated URLs.
get_top_relevant_websites(website_urls: list[str]) -> list[Website]
- Uses an OpenAI model to select the most relevant (important) websites from the final list of URLs.

Usage Example

Once installed and your environment variables are set, you can do:

import prettyprinter
from ai_url_aggregator import (
    search_for_web_urls,
    get_top_relevant_websites
)

# Optional: install prettyprinter extras for nicer output
prettyprinter.install_extras()

# Example query:
query = "Give me a list of all the real state agencies in Uruguay."

# Step 1: Get a cleaned, deduplicated, and verified list of URLs
online_urls = search_for_web_urls(query=query)

print("--- Online URLs ---")
prettyprinter.cpprint(online_urls)

# Step 2: Get the most important websites from the final list
most_important_websites = get_top_relevant_websites(website_urls=online_urls)

print("--- Most Important Websites ---")
prettyprinter.cpprint(most_important_websites)

Result (Main real state agencies in Uruguay):

[
    'https://www.infocasas.com.uy',
    'https://www.casasweb.com.uy',
    'https://www.mercadolibre.com.uy/inmuebles',
    'https://www.uruguayinmobiliarias.com'
]

License

This project is distributed under the MIT License. See LICENSE for more information.

All suggestions and improvements are welcome!

Project details

These details have not been verified by PyPI

Project links

repository

Release history Release notifications | RSS feed

This version

0.2

Feb 5, 2026

0.1

Feb 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_url_aggregator-0.2.tar.gz (5.3 kB view details)

Uploaded Feb 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ai_url_aggregator-0.2-py3-none-any.whl (5.7 kB view details)

Uploaded Feb 5, 2026 Python 3

File details

Details for the file ai_url_aggregator-0.2.tar.gz.

File metadata

Download URL: ai_url_aggregator-0.2.tar.gz
Upload date: Feb 5, 2026
Size: 5.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for ai_url_aggregator-0.2.tar.gz
Algorithm	Hash digest
SHA256	`02364f49165a08cd26755724d7e60bf3f55a85b573aa846b115d9a7f324b371c`
MD5	`e8bcd2a1b4e62fd76bcf6ef1fa9a513e`
BLAKE2b-256	`114c58cd4678197164d4f4b0fdff32940008e502eff074a3bac37210de6b54e8`

See more details on using hashes here.

File details

Details for the file ai_url_aggregator-0.2-py3-none-any.whl.

File metadata

Download URL: ai_url_aggregator-0.2-py3-none-any.whl
Upload date: Feb 5, 2026
Size: 5.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for ai_url_aggregator-0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dbbdd2b57276ca44f531f98e6b89232fda508969d5bc3c43db0e12fd83fd0d61`
MD5	`d80fe26278b58c3f3364ef8efb49f992`
BLAKE2b-256	`4af1724b01fb2a0e86639a3f62ec0f445b1bccd5c4e537c6a41cf3666406382c`

See more details on using hashes here.

ai-url-aggregator 0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ai_url_aggregator

Features

DeepWiki Docs: https://deepwiki.com/carlosplanchon/ai_url_aggregator

Installation

1. Install via PyPI

2. Set Environment Variables

3. (Optional) Install from Source

How It Works

Usage Example

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes