Skip to main content

Bayesian inference engine for geographic place guessing

Project description


⚡ Atlas GMP Engine

Bayesian inference engine for geographic place guessing

The AI brain powering GuessMyPlace
identifies any place on Earth through intelligent yes/no questions.


PyPI version Python 3.11+ License: MIT Tests



What is Atlas GMP Engine?

Atlas GMP Engine is a standalone Python package implementing a Bayesian inference system for geographic place identification. Given a dataset of places (countries, cities, landmarks, etc.) and a bank of yes/no questions, the engine:

  1. Maintains a probability distribution across all places
  2. Selects the most informative next question using information gain + Bayesian scoring
  3. Updates probabilities after each answer using likelihood multipliers
  4. Eliminates low-probability candidates through soft filtering
  5. Returns a confident prediction with accuracy metrics

Live performance: ~94% accuracy on 115 world countries, averaging 10 questions per game.


How It Works

                    ┌──────────────────────────────────────────┐
                    │            Atlas GMP Engine               │
                    │                                          │
  User Answer ──→  │  ProbabilityManager                      │
                    │    ↓  Bayesian likelihood updates         │
                    │  BayesianNetwork                         │
                    │    ↓  Belief propagation across attrs     │
                    │  InformationGain  ←── FeatureImportance  │
                    │    ↓  Shannon entropy (NumPy + C++)       │
                    │  QuestionSelector                        │
                    │    ↓  5-factor weighted scoring           │
                    │  ConfidenceCalculator                    │
                    │    ↓  4-signal composite score (0–100%)   │
                    │  Prediction / Next Question              │
                    └──────────────────────────────────────────┘

Core Components

Component File Purpose
InferenceEngine inference_engine.py Main coordinator — manages game sessions
ProbabilityManager probability_manager.py Bayesian likelihood updates + soft filtering
BayesianNetwork bayesian_network.py Belief propagation across related attributes
InformationGain information_gain.py Shannon entropy calculation (NumPy + C++)
QuestionSelector question_selector.py 5-factor question scoring + disambiguation
ConfidenceCalculator confidence_calculator.py 4-signal composite confidence score
FeatureImportance feature_importance.py ML-learned attribute weights
Embeddings embeddings.py MiniLM-L6-v2 semantic similarity
FAISSIndex faiss_index.py Fast last-mile disambiguation

Question Selection Algorithm

Every candidate question is scored with a weighted formula:

score = (information_gain  × 0.40)   # How much entropy does this reduce?
      + (stage_bonus        × 0.35)   # continent→region→culture→specific
      + (answer_balance     × 0.10)   # prefer questions that split ~50/50
      + (bayesian_belief    × 0.10)   # prior probability of this attribute value
      + (feature_importance × 0.05)   # weight learned from real game data (ML)

Stage ordering ensures the engine always asks broad questions first:

Stage 0 → continent, type
Stage 1 → region, sub-region
Stage 2 → coast, landlocked, island, climate, mountains
Stage 3 → population, size, GDP level
Stage 4 → government, religion, drive side
Stage 5 → language, flag, colonial history, UNESCO
Stage 6 → exports, famous for, neighbors
Stage 7 → capital, currency (very specific — asked last)

Probability Updates

Each answer multiplies all place probabilities using likelihood ratios:

Answer Match multiplier Mismatch multiplier
Yes ×10.0 ×0.001
Probably ×3.5 ×0.15
Don't Know ×1.0 ×1.0
Probably Not ×0.15 ×3.5
No ×0.001 ×10.0

After each update, probabilities are normalized and a soft filter eliminates candidates below 0.5% of the top probability (keeping at least 5).


Confidence Score

The confidence signal is a weighted combination of 4 measurements:

confidence = (probability_gap   × 0.40)   # gap between top-1 and top-2 probability
           + (normalized_prob   × 0.30)   # top probability / total
           + (item_count_score  × 0.20)   # fewer remaining = more confident
           + (entropy_score     × 0.10)   # inverse of distribution entropy

The engine triggers a guess when confidence crosses a stage-dependent threshold:

  • Questions 1–10: requires 99%
  • Questions 11–25: requires 95%
  • Questions 26–50: requires 88%
  • Questions 50+: requires 78%

Installation

pip install atlas-gmp-engine

With C++ extensions (recommended — 8× faster probability operations):

pip install atlas-gmp-engine[cpp]

With semantic embeddings (for FAISS disambiguation):

pip install atlas-gmp-engine[embeddings]

Full installation:

pip install atlas-gmp-engine[all]

Quick Start

from atlas_engine import InferenceEngine

# Define your places
places = [
    {
        "id": "bd",
        "name": "Bangladesh",
        "type": "country",
        "emoji": "🇧🇩",
        "description": "A South Asian nation known for the Sundarbans and the Padma River.",
        "fun_fact": "Bangladesh is home to the world's largest river delta.",
        "attributes": {
            "continent":    "asia",
            "subRegion":    "south asia",
            "landlocked":   False,
            "hasCoast":     True,
            "hasDelta":     True,
            "climate":      "tropical",
            "mainReligion": "islam",
            "language":     "Bengali",
            "population":   "verylarge",
            "driveSide":    "left",
            "famousFor":    ["Sundarbans", "Padma River", "garments industry", "rickshaws"],
        },
    },
    # ... more places
]

# Define your questions
questions = [
    {
        "id": "q1",
        "question_text": "🌏 Is it located in Asia?",
        "attribute": "continent",
        "value": "asia",
        "stage": 0,
        "base_weight": 1.0,
    },
    {
        "id": "q2",
        "question_text": "🌊 Does it have a coastline?",
        "attribute": "hasCoast",
        "value": True,
        "stage": 2,
        "base_weight": 1.2,
    },
    # ... more questions
]

# Initialize engine
engine = InferenceEngine()

# Optionally load ML-learned feature importance
engine.load_feature_importance({
    "continent":    0.95,
    "subRegion":    0.90,
    "mainReligion": 0.88,
    "famousFor":    0.85,
    "language":     0.90,
})

# Start a game session
session = engine.start_game(places, questions)

# Game loop
while True:
    question = engine.get_next_question(session)

    if question is None:
        break  # Engine is ready to guess

    print(f"\n{question['question_text']}")
    answer = input("(yes / probably / dontknow / probablynot / no): ").strip()

    result = engine.process_answer(session, answer)
    print(f"  Confidence: {result['confidence']:.1f}%")
    print(f"  Remaining:  {result['active_places_count']} places")

    if result["should_stop"]:
        break

# Get prediction
pred = engine.get_prediction(session)

if pred["prediction"]:
    p = pred["prediction"]
    print(f"\n🎯 Atlas guesses: {p['emoji']} {p['name']}")
    print(f"   Confidence: {pred['confidence']}%")
    print(f"   Questions asked: {pred['questions_asked']}")

Data Format

Place object

{
    "id":          str,              # unique identifier
    "name":        str,              # display name
    "type":        str,              # "country" | "city" | "landmark" | ...
    "emoji":       str | None,       # optional emoji flag or symbol
    "description": str | None,       # 2-3 sentence description
    "fun_fact":    str | None,       # surprising fact
    "attributes": {                  # key-value pairs matching your questions
        "continent":    str,         # "asia" | "europe" | "africa" | ...
        "subRegion":    str,         # "south asia" | "western europe" | ...
        "landlocked":   bool,
        "hasCoast":     bool,
        "hasMountains": bool,
        "climate":      str,         # "tropical" | "desert" | "temperate" | ...
        "population":   str,         # "small" | "medium" | "large" | "verylarge"
        "mainReligion": str,
        "language":     str,
        "famousFor":    list[str],   # list values supported
        "neighbors":    list[str],
        # ... any attributes your questions reference
    }
}

Question object

{
    "id":            str,     # unique identifier
    "question_text": str,     # "🌏 Is it in Asia?" — emoji prefix recommended
    "attribute":     str,     # "continent" — must match place attributes key
    "value":         any,     # "asia" — the value for which answer is YES
    "stage":         int,     # 0–7 (see stage ordering above)
    "base_weight":   float,   # 1.0 default, higher = preferred
}

Advanced Usage

Load ML-learned feature importance

engine = InferenceEngine()

# Scores from 0.0 to 1.0 — higher = more important for discrimination
engine.load_feature_importance({
    "continent":    0.95,
    "type":         0.98,
    "subRegion":    0.90,
    "mainReligion": 0.88,
    "language":     0.90,
    "famousFor":    0.85,
    "capital":      0.95,
    "landlocked":   0.80,
})

Handle user correction (feedback)

# When Atlas guesses wrong and user corrects it:
engine.apply_feedback(session, correct_place_id="bd")
# Boosts Bangladesh ×25, reduces all others ×0.04
# Engine can then continue asking and make a new prediction

Use semantic embeddings for disambiguation

from atlas_engine.embeddings import embed_place

# Generate embedding for a place
place_data = {"name": "Bangladesh", "description": "...", "attributes": {...}}
embedding = embed_place(place_data)   # returns numpy array (384-dim)
# Store in your vector DB (e.g. Supabase pgvector)

Build FAISS index for fast similarity search

from atlas_engine.faiss_index import build_index, load_index

# Build from places with embeddings
places_with_embeddings = [
    {"id": "bd", "name": "Bangladesh", "type": "country", "embedding": [...]},
    # ...
]
build_index(places_with_embeddings)

# Load into memory (call once at startup)
load_index()

C++ Extensions

For large datasets (10,000+ places), the hot-path operations are implemented in C++ via pybind11:

atlas_engine/cpp/probability_ops.cpp
  ├── normalize_probabilities()   ← called after every answer
  ├── soft_filter()               ← eliminates near-zero candidates
  ├── shannon_entropy()           ← information gain inner loop
  └── information_gain_binary()   ← runs for every candidate question

Performance comparison:

Dataset Python (NumPy) C++ (pybind11)
100 places ~3ms ~1ms
1,000 places ~25ms ~5ms
10,000 places ~600ms ~70ms
50,000 places ~8s ~400ms

The engine automatically falls back to NumPy if C++ is not compiled.

Build C++ extensions manually:

cd atlas_engine/cpp
pip install pybind11
python setup.py build_ext --inplace

Performance

Dataset Size Avg Response Memory C++ Required
≤ 1,000 < 20ms ~150MB No
≤ 10,000 < 80ms ~800MB Recommended
≤ 50,000 < 400ms ~5GB Yes

Requirements

Core (always required):

numpy >= 1.26
scipy >= 1.13
structlog >= 24.2   (optional, falls back to stdlib logging)

Optional extras:

scikit-learn >= 1.5       (ML feature importance training)
sentence-transformers >= 3.0  (semantic embeddings)
faiss-cpu >= 1.8          (fast vector similarity search)
pybind11 >= 2.13          (C++ hot-path extensions)

Changelog

[1.0.0] — 2026

Initial release as a standalone package.

Features:

  • Bayesian inference engine with 5-factor question selection
  • Probability Manager with likelihood multipliers
  • Bayesian Network for belief propagation across attributes
  • Information Gain Calculator (NumPy + C++ pybind11)
  • Confidence Calculator (4-signal composite score)
  • FAISS semantic index for last-mile disambiguation
  • MiniLM-L6-v2 embeddings (384-dim)
  • Soft filtering with configurable thresholds
  • Stage-ordered question selection
  • Feature importance (both static and ML-learned)
  • C++ extensions for hot-path operations (8× speedup)
  • Graceful fallback to pure Python when C++ unavailable

Used By

  • GuessMyPlace — the geography guessing game this engine was built for

License

MIT License — see LICENSE for details.


Part of the GuessMyPlace project

PyPI · GuessMyPlace · Docs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atlas_gmp_engine-1.0.0.tar.gz (27.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

atlas_gmp_engine-1.0.0-py3-none-any.whl (23.4 kB view details)

Uploaded Python 3

File details

Details for the file atlas_gmp_engine-1.0.0.tar.gz.

File metadata

  • Download URL: atlas_gmp_engine-1.0.0.tar.gz
  • Upload date:
  • Size: 27.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for atlas_gmp_engine-1.0.0.tar.gz
Algorithm Hash digest
SHA256 88766d74dc647678583dd251d41256c0fe8c99fc9373799a0e236ea4171187e4
MD5 589ca7f66667d9381e0a550ca4f3e047
BLAKE2b-256 9e935fd75fc7e2be06be45f6fe7e8f50430a714d7f53a7b92044c6d88882cccf

See more details on using hashes here.

File details

Details for the file atlas_gmp_engine-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for atlas_gmp_engine-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 92ffaf3d5c1686e7c03a5a2508d5102fa67fb2381fed39f8ee71ed34d4c9a57f
MD5 a0d8c46dbeddcc373c0f9b3750e5b4ca
BLAKE2b-256 4e26069fdbccd9aa702744c51201e60fc9e3844c51f64ebab9992f0297309683

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page