Skip to main content

AI-powered Wikipedia navigation using semantic similarity

Project description

WikiRaces

WikiRaces is an AI-powered tool for navigating Wikipedia using semantic similarity. Instead of randomly clicking links, it finds intelligent paths between Wikipedia articles by understanding their content semantically.

Features

  • Semantic Navigation: Uses sentence transformers to understand article content and find meaningful connections
  • Smart Path Finding: Avoids dead ends and cycles while navigating toward the target
  • Real-time Progress: Shows progress with confidence metrics and current article information
  • Robust Error Handling: Gracefully handles missing pages, disambiguation pages, and network issues
  • Local AI Models: No external API dependencies - everything runs locally

Installation

pip install wikiraces

Quick Start

from wikiraces import WikiBot

# Create a bot to navigate from Python to Artificial Intelligence
bot = WikiBot("Python (programming language)", "Artificial intelligence")

# Run the navigation
success = bot.run()

if success:
    print(f"Found path in {len(bot.path) - 1} steps!")
    print(" -> ".join(bot.path))
else:
    print("Could not find a path")

Advanced Usage

Customize Search Parameters

# Limit the number of candidate links to consider at each step
bot = WikiBot("Source Article", "Target Article", limit=20)

# Check if articles exist before starting
if bot.exists("Some Article"):
    print("Article exists!")

# Get links from any Wikipedia page
links = bot.links("Python (programming language)")
print(f"Found {len(links)} outgoing links")

Semantic Similarity

from wikiraces.embed import most_similar_with_scores

# Find most semantically similar articles
candidates = ["Machine Learning", "Data Science", "Web Development"]
similar = most_similar_with_scores("Artificial Intelligence", candidates)

for article, score in similar:
    print(f"{article}: {score:.3f}")

How It Works

  1. Start at the source Wikipedia article
  2. Extract all outgoing links from the current article
  3. Filter out dead ends and previously visited pages
  4. Rank candidate links by semantic similarity to the target
  5. Rerank using article summaries for better context understanding
  6. Move to the most promising next article
  7. Repeat until reaching the target or getting stuck

API Reference

WikiBot Class

class WikiBot:
    def __init__(self, source: str, destination: str, limit: int = 15)
    def run() -> bool
    def exists(page: str) -> bool
    def links(page: str) -> list[str]

Parameters:

  • source: Starting Wikipedia article title
  • destination: Target Wikipedia article title
  • limit: Maximum number of candidate links to consider (default: 15)

Returns:

  • run(): True if path found, False otherwise
  • exists(): True if Wikipedia page exists
  • links(): List of outgoing links from the page

Development

# Clone the repository
git clone https://github.com/markshteyn/wikiraces.git
cd wikiraces

# Install with Poetry
poetry install

# Run tests
poetry run pytest

# Run with verbose output
poetry run pytest -v -s

Requirements

  • Python 3.9+
  • sentence-transformers
  • wikipedia
  • numpy
  • tqdm

License

MIT License - see LICENSE file for details.

Contributing

Contributions welcome! Please feel free to submit a Pull Request.

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikiraces-0.1.1.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wikiraces-0.1.1-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file wikiraces-0.1.1.tar.gz.

File metadata

  • Download URL: wikiraces-0.1.1.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.13.5 Darwin/24.5.0

File hashes

Hashes for wikiraces-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5c4a5ade077481a74318bb3f950157201b5176877f83094f17a655581e46b990
MD5 85886096bfea5dca21dc164f65b16ba5
BLAKE2b-256 4d81435891d198d760175a07e6a46942bf79a32a805935fe85fad690f484f18c

See more details on using hashes here.

File details

Details for the file wikiraces-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: wikiraces-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.13.5 Darwin/24.5.0

File hashes

Hashes for wikiraces-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ab9f0d1d3ea0995247f9b264b2dd77a465c07a01ed9fa8b08141ec0422cf82ca
MD5 fd4fd46c0888ed29e33827cab4f43fee
BLAKE2b-256 6fe81b396c36bb6dbbf332236b55261c892b11d0d5893c54f20a456c1e3b31c9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page