Skip to main content

Async web search library supporting Google, Wikipedia, and arXiv

Project description

Web Search

Async web search library supporting Google Custom Search, Wikipedia, and arXiv APIs.

You can search across multiple sources and retrieve relevant, clean, and formatted results efficiently.

🌟 Features

  • ⚡ Asynchronous Searching: Perform searches concurrently across multiple sources
  • 🔗 Multi-Source Support: Query Google Custom Search, Wikipedia, and arXiv
  • 🧹 Content extraction and cleaning
  • 🔧 Configurable Search Parameters: Adjust maximum results, preview length, and sources.

📋 Prerequisites

  • 🐍 Python 3.8 or newer
  • 🔑 API keys and configuration:
    • Google Search: Requires a Google API key and a Custom Search Engine (CSE) ID.
    • arXiv: No API key required.
    • Wikipedia: No API key required.

Set environment variables for Google API:

export GOOGLE_API_KEY="your_google_api_key"
export CSE_ID="your_cse_id"

📦 Installation

pip install async-web-search

🛠️ Usage

Example 1: Search across multiple sources

from web_search import WebSearch, WebSearchConfig, GoogleSearchConfig

config = WebSearchConfig(sources=["google", "arxiv"])
results = await WebSearch(config).search("quantum computing")

print(results)

Example 2: Google Search

from web_search import GoogleSearch, GoogleSearchConfig

config = GoogleSearchConfig(
    api_key="your_google_api_key",
    cse_id="your_cse_id",
    max_results=5
)
results = await GoogleSearch(config)._search("quantum computing")

for result in results:
    print(result)

Example 3: Wikipedia Search

from web_search import WikipediaSearch, BaseConfig

wiki_config = BaseConfig(max_results=5, max_preview_chars=500)
results = await WikipediaSearch(wiki_config)._search("deep learning")

for result in results:
    print(result)

Example 4: ArXiv Search

from web_search import ArxivSearch, BaseConfig

arxiv_config = BaseConfig(max_results=3, max_preview_chars=800)
results = await ArxivSearch(arxiv_config)._search("neural networks")

for result in results:
    print(result)

📘 API Overview

🔧 Configuration

  • BaseConfig: Shared configuration for all sources (e.g., max_results, max_preview_chars).
  • GoogleSearchConfig: Google-specific settings (e.g., api_key, cse_id).
  • WebSearchConfig: Configuration for the overall search process (e.g., sources to query).

📚 Classes

  • WebSearch: Entry point for performing searches across multiple sources.
  • GoogleSearch: Handles searches via Google Custom Search Engine API.
  • WikipediaSearch: Searches Wikipedia and retrieves article previews.
  • ArxivSearch: Queries arXiv for academic papers.

⚙️ Methods

  • search(query: str): Main search method for WebSearch.
  • _search(query: str): Source-specific search logic for GoogleSearch, WikipediaSearch, and ArxivSearch.

🤝 Contributing

We welcome contributions! To contribute:

  • Fork the repository.
  • Create a new branch (git checkout -b feature-name).
  • Commit your changes (git commit -am "Add new feature").
  • Push to the branch (git push origin feature-name).
  • Open a pull request.

🧪 Running Tests

pytest -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

async_web_search-0.2.0.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

async_web_search-0.2.0-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file async_web_search-0.2.0.tar.gz.

File metadata

  • Download URL: async_web_search-0.2.0.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for async_web_search-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b84cfc1eaae8240ffed33bdc429974777acedbc1c3df428418a4e8656253930d
MD5 96ee2959164730737bd3efb441a91044
BLAKE2b-256 be020f5392c746f43fb89f5d84c832997e6a5e42258badf60ed452ba98c1a14e

See more details on using hashes here.

File details

Details for the file async_web_search-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for async_web_search-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6d71795fcb35a8df3045aa8b860fddf7ff9e4d4e0133b49c339c2e327c2642cb
MD5 82489ec039b5beb61056578af868c234
BLAKE2b-256 4501494150de7a56b0fa44f729d9bbbd4e4bad21769de5440c9e8deb7d44296e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page