Skip to main content

AI-powered Wikipedia navigation using semantic similarity

Project description

WikiRaces 🏁

AI-powered Wikipedia navigation using semantic similarity. WikiRaces finds intelligent paths between Wikipedia articles by understanding content semantically, not just following random links.

Features ✨

  • Semantic Navigation: Uses sentence transformers to understand article content and find meaningful connections
  • Smart Path Finding: Avoids dead ends and cycles while navigating toward the target
  • Real-time Progress: Beautiful progress bars showing confidence and current article
  • Robust Error Handling: Gracefully handles missing pages, disambiguation pages, and network issues
  • Local AI Models: No external API dependencies - everything runs locally

Installation 📦

pip install wikiraces

Quick Start 🚀

from wikiraces import WikiBot

# Create a bot to navigate from Python to Artificial Intelligence
bot = WikiBot("Python (programming language)", "Artificial intelligence")

# Run the navigation
success = bot.run()

if success:
    print(f"Found path in {len(bot.path) - 1} steps!")
    print(" -> ".join(bot.path))
else:
    print("Could not find a path")

Advanced Usage 🔧

Customize Search Parameters

# Limit the number of candidate links to consider at each step
bot = WikiBot("Source Article", "Target Article", limit=20)

# Check if articles exist before starting
if bot.exists("Some Article"):
    print("Article exists!")

# Get links from any Wikipedia page
links = bot.links("Python (programming language)")
print(f"Found {len(links)} outgoing links")

Semantic Similarity

from wikiraces.embed import most_similar_with_scores

# Find most semantically similar articles
candidates = ["Machine Learning", "Data Science", "Web Development"]
similar = most_similar_with_scores("Artificial Intelligence", candidates)

for article, score in similar:
    print(f"{article}: {score:.3f}")

How It Works 🧠

  1. Start at the source Wikipedia article
  2. Extract all outgoing links from the current article
  3. Filter out dead ends and previously visited pages
  4. Rank candidate links by semantic similarity to the target
  5. Rerank using article summaries for better context understanding
  6. Move to the most promising next article
  7. Repeat until reaching the target or getting stuck

API Reference 📚

WikiBot Class

class WikiBot:
    def __init__(self, source: str, destination: str, limit: int = 15)
    def run() -> bool
    def exists(page: str) -> bool
    def links(page: str) -> list[str]

Parameters:

  • source: Starting Wikipedia article title
  • destination: Target Wikipedia article title
  • limit: Maximum number of candidate links to consider (default: 15)

Returns:

  • run(): True if path found, False otherwise
  • exists(): True if Wikipedia page exists
  • links(): List of outgoing links from the page

Development 🛠️

# Clone the repository
git clone https://github.com/markshteyn/wikiraces.git
cd wikiraces

# Install with Poetry
poetry install

# Run tests
poetry run pytest

# Run with verbose output
poetry run pytest -v -s

Requirements 📋

  • Python 3.9+
  • sentence-transformers
  • wikipedia
  • numpy
  • tqdm

License 📄

MIT License - see LICENSE file for details.

Contributing 🤝

Contributions welcome! Please feel free to submit a Pull Request.

Acknowledgments 🙏

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikiraces-0.1.0.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wikiraces-0.1.0-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file wikiraces-0.1.0.tar.gz.

File metadata

  • Download URL: wikiraces-0.1.0.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.13.5 Darwin/24.5.0

File hashes

Hashes for wikiraces-0.1.0.tar.gz
Algorithm Hash digest
SHA256 80771b945473d8676fbe83ea212b25f5e64c24be7916182d25722204baa955cc
MD5 cfb57133ec3a4813bf119b5a9fbeb1d0
BLAKE2b-256 a41092afbee2c1e98ebd964256ace1fae1c73afcc7eabdb8dcef06a74b44f133

See more details on using hashes here.

File details

Details for the file wikiraces-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: wikiraces-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.13.5 Darwin/24.5.0

File hashes

Hashes for wikiraces-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 47df2414ea8deed4fc9b0977c268a0e758475848e8274426034f3600b53bbba7
MD5 6590dc686d05c66167a741bb126eb699
BLAKE2b-256 b19be173b31768d58590cf605c1b20422e01a1575f10dc5b631f8e9fb0ebd31e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page