Bayesian inference engine for geographic place guessing
Project description
⚡ Atlas GMP Engine
Bayesian inference engine for geographic place guessing
The AI brain powering GuessMyPlace —
identifies any place on Earth through intelligent yes/no questions.
What is Atlas GMP Engine?
Atlas GMP Engine is a standalone Python package implementing a Bayesian inference system for geographic place identification. Given a dataset of places (countries, cities, landmarks, etc.) and a bank of yes/no questions, the engine:
- Maintains a probability distribution across all places
- Selects the most informative next question using information gain + Bayesian scoring
- Updates probabilities after each answer using likelihood multipliers
- Eliminates low-probability candidates through soft filtering
- Returns a confident prediction with accuracy metrics
Live performance: ~94% accuracy on 115 world countries, averaging 10 questions per game.
How It Works
┌──────────────────────────────────────────┐
│ Atlas GMP Engine │
│ │
User Answer ──→ │ ProbabilityManager │
│ ↓ Bayesian likelihood updates │
│ BayesianNetwork │
│ ↓ Belief propagation across attrs │
│ InformationGain ←── FeatureImportance │
│ ↓ Shannon entropy (NumPy + C++) │
│ QuestionSelector │
│ ↓ 5-factor weighted scoring │
│ ConfidenceCalculator │
│ ↓ 4-signal composite score (0–100%) │
│ Prediction / Next Question │
└──────────────────────────────────────────┘
Core Components
| Component | File | Purpose |
|---|---|---|
InferenceEngine |
inference_engine.py |
Main coordinator — manages game sessions |
ProbabilityManager |
probability_manager.py |
Bayesian likelihood updates + soft filtering |
BayesianNetwork |
bayesian_network.py |
Belief propagation across related attributes |
InformationGain |
information_gain.py |
Shannon entropy calculation (NumPy + C++) |
QuestionSelector |
question_selector.py |
5-factor question scoring + disambiguation |
ConfidenceCalculator |
confidence_calculator.py |
4-signal composite confidence score |
FeatureImportance |
feature_importance.py |
ML-learned attribute weights |
Embeddings |
embeddings.py |
MiniLM-L6-v2 semantic similarity |
FAISSIndex |
faiss_index.py |
Fast last-mile disambiguation |
Question Selection Algorithm
Every candidate question is scored with a weighted formula:
score = (information_gain × 0.40) # How much entropy does this reduce?
+ (stage_bonus × 0.35) # continent→region→culture→specific
+ (answer_balance × 0.10) # prefer questions that split ~50/50
+ (bayesian_belief × 0.10) # prior probability of this attribute value
+ (feature_importance × 0.05) # weight learned from real game data (ML)
Stage ordering ensures the engine always asks broad questions first:
Stage 0 → continent, type
Stage 1 → region, sub-region
Stage 2 → coast, landlocked, island, climate, mountains
Stage 3 → population, size, GDP level
Stage 4 → government, religion, drive side
Stage 5 → language, flag, colonial history, UNESCO
Stage 6 → exports, famous for, neighbors
Stage 7 → capital, currency (very specific — asked last)
Probability Updates
Each answer multiplies all place probabilities using likelihood ratios:
| Answer | Match multiplier | Mismatch multiplier |
|---|---|---|
| Yes | ×10.0 | ×0.001 |
| Probably | ×3.5 | ×0.15 |
| Don't Know | ×1.0 | ×1.0 |
| Probably Not | ×0.15 | ×3.5 |
| No | ×0.001 | ×10.0 |
After each update, probabilities are normalized and a soft filter eliminates candidates below 0.5% of the top probability (keeping at least 5).
Confidence Score
The confidence signal is a weighted combination of 4 measurements:
confidence = (probability_gap × 0.40) # gap between top-1 and top-2 probability
+ (normalized_prob × 0.30) # top probability / total
+ (item_count_score × 0.20) # fewer remaining = more confident
+ (entropy_score × 0.10) # inverse of distribution entropy
The engine triggers a guess when confidence crosses a stage-dependent threshold:
- Questions 1–10: requires 99%
- Questions 11–25: requires 95%
- Questions 26–50: requires 88%
- Questions 50+: requires 78%
Installation
pip install atlas-gmp-engine
With C++ extensions (recommended — 8× faster probability operations):
pip install atlas-gmp-engine[cpp]
With semantic embeddings (for FAISS disambiguation):
pip install atlas-gmp-engine[embeddings]
Full installation:
pip install atlas-gmp-engine[all]
Quick Start
from atlas_engine import InferenceEngine
# Define your places
places = [
{
"id": "bd",
"name": "Bangladesh",
"type": "country",
"emoji": "🇧🇩",
"description": "A South Asian nation known for the Sundarbans and the Padma River.",
"fun_fact": "Bangladesh is home to the world's largest river delta.",
"attributes": {
"continent": "asia",
"subRegion": "south asia",
"landlocked": False,
"hasCoast": True,
"hasDelta": True,
"climate": "tropical",
"mainReligion": "islam",
"language": "Bengali",
"population": "verylarge",
"driveSide": "left",
"famousFor": ["Sundarbans", "Padma River", "garments industry", "rickshaws"],
},
},
# ... more places
]
# Define your questions
questions = [
{
"id": "q1",
"question_text": "🌏 Is it located in Asia?",
"attribute": "continent",
"value": "asia",
"stage": 0,
"base_weight": 1.0,
},
{
"id": "q2",
"question_text": "🌊 Does it have a coastline?",
"attribute": "hasCoast",
"value": True,
"stage": 2,
"base_weight": 1.2,
},
# ... more questions
]
# Initialize engine
engine = InferenceEngine()
# Optionally load ML-learned feature importance
engine.load_feature_importance({
"continent": 0.95,
"subRegion": 0.90,
"mainReligion": 0.88,
"famousFor": 0.85,
"language": 0.90,
})
# Start a game session
session = engine.start_game(places, questions)
# Game loop
while True:
question = engine.get_next_question(session)
if question is None:
break # Engine is ready to guess
print(f"\n{question['question_text']}")
answer = input("(yes / probably / dontknow / probablynot / no): ").strip()
result = engine.process_answer(session, answer)
print(f" Confidence: {result['confidence']:.1f}%")
print(f" Remaining: {result['active_places_count']} places")
if result["should_stop"]:
break
# Get prediction
pred = engine.get_prediction(session)
if pred["prediction"]:
p = pred["prediction"]
print(f"\n🎯 Atlas guesses: {p['emoji']} {p['name']}")
print(f" Confidence: {pred['confidence']}%")
print(f" Questions asked: {pred['questions_asked']}")
Data Format
Place object
{
"id": str, # unique identifier
"name": str, # display name
"type": str, # "country" | "city" | "landmark" | ...
"emoji": str | None, # optional emoji flag or symbol
"description": str | None, # 2-3 sentence description
"fun_fact": str | None, # surprising fact
"attributes": { # key-value pairs matching your questions
"continent": str, # "asia" | "europe" | "africa" | ...
"subRegion": str, # "south asia" | "western europe" | ...
"landlocked": bool,
"hasCoast": bool,
"hasMountains": bool,
"climate": str, # "tropical" | "desert" | "temperate" | ...
"population": str, # "small" | "medium" | "large" | "verylarge"
"mainReligion": str,
"language": str,
"famousFor": list[str], # list values supported
"neighbors": list[str],
# ... any attributes your questions reference
}
}
Question object
{
"id": str, # unique identifier
"question_text": str, # "🌏 Is it in Asia?" — emoji prefix recommended
"attribute": str, # "continent" — must match place attributes key
"value": any, # "asia" — the value for which answer is YES
"stage": int, # 0–7 (see stage ordering above)
"base_weight": float, # 1.0 default, higher = preferred
}
Advanced Usage
Load ML-learned feature importance
engine = InferenceEngine()
# Scores from 0.0 to 1.0 — higher = more important for discrimination
engine.load_feature_importance({
"continent": 0.95,
"type": 0.98,
"subRegion": 0.90,
"mainReligion": 0.88,
"language": 0.90,
"famousFor": 0.85,
"capital": 0.95,
"landlocked": 0.80,
})
Handle user correction (feedback)
# When Atlas guesses wrong and user corrects it:
engine.apply_feedback(session, correct_place_id="bd")
# Boosts Bangladesh ×25, reduces all others ×0.04
# Engine can then continue asking and make a new prediction
Use semantic embeddings for disambiguation
from atlas_engine.embeddings import embed_place
# Generate embedding for a place
place_data = {"name": "Bangladesh", "description": "...", "attributes": {...}}
embedding = embed_place(place_data) # returns numpy array (384-dim)
# Store in your vector DB (e.g. Supabase pgvector)
Build FAISS index for fast similarity search
from atlas_engine.faiss_index import build_index, load_index
# Build from places with embeddings
places_with_embeddings = [
{"id": "bd", "name": "Bangladesh", "type": "country", "embedding": [...]},
# ...
]
build_index(places_with_embeddings)
# Load into memory (call once at startup)
load_index()
C++ Extensions
For large datasets (10,000+ places), the hot-path operations are implemented in C++ via pybind11:
atlas_engine/cpp/probability_ops.cpp
├── normalize_probabilities() ← called after every answer
├── soft_filter() ← eliminates near-zero candidates
├── shannon_entropy() ← information gain inner loop
└── information_gain_binary() ← runs for every candidate question
Performance comparison:
| Dataset | Python (NumPy) | C++ (pybind11) |
|---|---|---|
| 100 places | ~3ms | ~1ms |
| 1,000 places | ~25ms | ~5ms |
| 10,000 places | ~600ms | ~70ms |
| 50,000 places | ~8s | ~400ms |
The engine automatically falls back to NumPy if C++ is not compiled.
Build C++ extensions manually:
cd atlas_engine/cpp
pip install pybind11
python setup.py build_ext --inplace
Performance
| Dataset Size | Avg Response | Memory | C++ Required |
|---|---|---|---|
| ≤ 1,000 | < 20ms | ~150MB | No |
| ≤ 10,000 | < 80ms | ~800MB | Recommended |
| ≤ 50,000 | < 400ms | ~5GB | Yes |
Requirements
Core (always required):
numpy >= 1.26
scipy >= 1.13
structlog >= 24.2 (optional, falls back to stdlib logging)
Optional extras:
scikit-learn >= 1.5 (ML feature importance training)
sentence-transformers >= 3.0 (semantic embeddings)
faiss-cpu >= 1.8 (fast vector similarity search)
pybind11 >= 2.13 (C++ hot-path extensions)
Changelog
[1.0.0] — 2026
Initial release as a standalone package.
Features:
- Bayesian inference engine with 5-factor question selection
- Probability Manager with likelihood multipliers
- Bayesian Network for belief propagation across attributes
- Information Gain Calculator (NumPy + C++ pybind11)
- Confidence Calculator (4-signal composite score)
- FAISS semantic index for last-mile disambiguation
- MiniLM-L6-v2 embeddings (384-dim)
- Soft filtering with configurable thresholds
- Stage-ordered question selection
- Feature importance (both static and ML-learned)
- C++ extensions for hot-path operations (8× speedup)
- Graceful fallback to pure Python when C++ unavailable
Used By
- GuessMyPlace — the geography guessing game this engine was built for
License
MIT License — see LICENSE for details.
Part of the GuessMyPlace project
PyPI · GuessMyPlace · Docs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file atlas_gmp_engine-1.0.0.tar.gz.
File metadata
- Download URL: atlas_gmp_engine-1.0.0.tar.gz
- Upload date:
- Size: 27.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
88766d74dc647678583dd251d41256c0fe8c99fc9373799a0e236ea4171187e4
|
|
| MD5 |
589ca7f66667d9381e0a550ca4f3e047
|
|
| BLAKE2b-256 |
9e935fd75fc7e2be06be45f6fe7e8f50430a714d7f53a7b92044c6d88882cccf
|
File details
Details for the file atlas_gmp_engine-1.0.0-py3-none-any.whl.
File metadata
- Download URL: atlas_gmp_engine-1.0.0-py3-none-any.whl
- Upload date:
- Size: 23.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
92ffaf3d5c1686e7c03a5a2508d5102fa67fb2381fed39f8ee71ed34d4c9a57f
|
|
| MD5 |
a0d8c46dbeddcc373c0f9b3750e5b4ca
|
|
| BLAKE2b-256 |
4e26069fdbccd9aa702744c51201e60fc9e3844c51f64ebab9992f0297309683
|