Skip to main content

Production-Grade Explainable AI for city travel purpose classification: temporal/seasonal awareness, city fingerprints, confidence decomposition, synthetic data generation - 100% ethical multi-source harvesting, offline-first!

Project description

TravelPurpose

PyPI version Python 3.10+ License: MIT

A production-grade Python library for classifying world cities by travel purpose using multi-source data from public travel platforms and knowledge bases.

🆕 What's New in v2.0.1 (Aralık 2024)

Production-Ready Enhancements:

  • Enhanced PyPI Description: Better discoverability with clearer value propositions
  • Offline-First: All core features work without network access (uses cached data)
  • Ethical Harvesting: 100% ToS-compliant multi-source data collection
  • Performance Optimized: Faster predictions with improved caching
  • Type Safety: Complete type hints for better IDE support

What Makes TravelPurpose Production-Grade:

# Explainable predictions - understand WHY
result = predict_purpose("Istanbul", explain=True)
print(result['explanation'])  # Human-readable reasons

# Seasonal awareness - purposes change with time
summer = predict_purpose("Antalya", season="summer")  # Beach boosted
winter = predict_purpose("St. Moritz", season="winter")  # Ski boosted

# Offline-first - works without network
result = predict_purpose("Paris", use_cache=True)  # Uses local data

Production Benefits:

  • 🚀 No API Costs: 100% local ML, no external API dependencies
  • 🔒 Privacy-Safe: Synthetic data generator for testing
  • 📊 Transparent: Full explainability with confidence breakdowns
  • Fast: Cached predictions, optimized for production

🆕 What's New in v2.0

Explainable AI - Understand WHY predictions are made:

  • explain=True: Get ambiguity scores, confidence breakdowns, and human-readable explanations
  • Temporal Awareness: Purposes change with seasons (month=7 for summer travel)
  • City Fingerprints: Unique purpose signatures for each city
  • Confidence Decomposition: See what contributes to each prediction
  • Synthetic Data Generator: Privacy-safe data for testing and research
# v2.0 NEW: Explainable predictions
result = predict_purpose("Istanbul", explain=True)
print(result['ambiguity_score'])  # 0.32 - moderate ambiguity
print(result['explanation']['reasons'])
# ['High cross-source agreement', 'UNESCO/Heritage site boost', ...]

# v2.0 NEW: Seasonal awareness
summer_purposes = predict_purpose("Antalya", month=7)  # Beach boosted in summer
winter_purposes = predict_purpose("St. Moritz", season="winter")  # Ski boosted

Features

  • Multi-Label Classification: Cities can have multiple travel purposes (e.g., Business + Culture + Transit)
  • Rich Ontology: 12 main categories and 70+ subcategories covering all travel purposes
  • Multi-Source Data: Integrates data from Wikidata, Booking.com, Agoda, Trivago, Kayak, Trip.com, and Skyscanner
  • Hybrid Classifier: Combines rule-based and embedding-based approaches with confidence scoring
  • 🆕 Explainable AI: Ambiguity scores, confidence decomposition, human-readable explanations
  • 🆕 Temporal Awareness: Seasonal purpose adjustments (e.g., beach cities in summer)
  • 🆕 City Fingerprints: Unique purpose signatures for similarity analysis
  • 🆕 Synthetic Data: Privacy-safe data generation for testing and research
  • Python API & CLI: Easy-to-use programmatic and command-line interfaces
  • Ethical Data Collection: Fully compliant with ToS, respects robots.txt, implements rate limiting
  • Production Ready: Comprehensive tests, CI/CD, type hints, logging, caching

Installation

pip install travelpurpose

Quick Start

Python API

from travelpurpose import predict_purpose, tags

# Predict travel purposes for a city
result = predict_purpose("Istanbul")
print(result)
# {
#     'main': ['Culture_Heritage', 'Transit_Gateway', 'Leisure'],
#     'sub': ['UNESCO_Site', 'Old_Town', 'Mega_Air_Hub', 'Gastronomy'],
#     'confidence': 0.86
# }

# Get raw tags from all sources
city_tags = tags("Antalya")
print(city_tags[:3])
# [
#     {'tag': 'beachfront', 'source': 'booking', 'url': '...', 'ts': '...'},
#     {'tag': 'resort', 'source': 'agoda', 'url': '...'},
#     {'tag': 'all-inclusive', 'source': 'trivago', 'url': '...'}
# ]

Command Line

# Predict purposes for a city
tpurpose predict "Paris"

# Show raw tags
tpurpose show-tags "Dubai" --limit 20

# Search for cities
tpurpose find "turkey"

# Rebuild dataset (requires network access)
tpurpose rebuild --sample 100 --verbose

Travel Purpose Ontology

Main Categories (12)

  • Business: Finance hubs, tech centers, MICE destinations
  • Leisure: City breaks, luxury, shopping, gastronomy
  • Culture_Heritage: UNESCO sites, museums, old towns, architecture
  • Beach_Resort: Beachfront, islands, diving, all-inclusive
  • Adventure_Nature: Trekking, safari, desert, extreme sports
  • Family: Theme parks, zoos, safe cities, kid-friendly
  • Medical_Health: Medical tourism, wellness, spa, rehabilitation
  • Religious_Pilgrimage: Islamic, Christian, Buddhist, Hindu pilgrimage sites
  • Winter_Snow: Ski resorts, winter sports, aurora viewing
  • Nightlife_Entertainment: Party districts, casinos, music festivals
  • Transit_Gateway: Major airport hubs, connecting destinations
  • Seaman_Crew: Crew change ports, maritime facilities

Subcategories (70+)

Each main category has 4-9 specialized subcategories. See travelpurpose/ontology/ontology.yaml for the complete taxonomy.

Data Sources

All data collection is public, ToS-compliant, and ethical:

Knowledge Bases

  • Wikidata: Canonical city data, coordinates, population, UNESCO sites
  • Wikipedia: City categories and cultural information

Travel Platforms (Public Data Only)

  • Booking.com: Public structured data (JSON-LD), meta tags, city guides
  • Agoda: Public landing pages, sitemaps, accommodation types
  • Trivago: Public city pages, district information
  • Kayak: Public city guides, travel information
  • Trip.com: Public destination pages, attractions
  • Skyscanner: Public autocomplete API for city normalization

Compliance Features

  • Respects robots.txt
  • Rate limiting (configurable, default 1.5s between requests)
  • HTTP caching (24-hour TTL)
  • Exponential backoff for retries
  • No authentication, logins, or private APIs
  • User-Agent identification
  • Graceful degradation when sources are unavailable

Architecture

Data Pipeline

python scripts/pipeline.py --min-population 100000 --sample 50

The pipeline:

  1. Loads NBD.xlsx (if available) with existing city classifications
  2. Fetches canonical city data from Wikidata (cities >100K population)
  3. Harvests public tags from all sources
  4. Normalizes tags to English, handles Unicode city names
  5. Maps tags to ontology using fuzzy matching
  6. Merges with NBD purposes (if available)
  7. Classifies using hybrid rule-based + embedding approach
  8. Calculates confidence scores
  9. Exports to travelpurpose/data/cities.{parquet,json}

Classifier Design

Hybrid Approach:

  1. Rule-Based (deterministic): Strong tags directly map to categories
  2. Tag Aggregation: Weighted voting from multiple sources
  3. Confidence Calibration: Based on data quality and agreement

Source Weights:

  • Wikidata/UNESCO: 1.5-2.0x (high authority)
  • Booking.com/Agoda: 1.0x (standard)
  • Trivago/Kayak/Trip.com: 0.9x
  • Evidence type boosts: JSON-LD (1.2x), Meta (1.0x), Headings (0.8x)

Configuration

Rate Limiting

from travelpurpose.utils.harvest import HarvestConfig

config = HarvestConfig(
    rate_limit=2.0,  # 2 seconds between requests
    timeout=15,
    max_retries=3,
    cache_ttl=86400,  # 24 hours
)

Extending the Ontology

Edit travelpurpose/ontology/ontology.yaml:

main_categories:
  - Your_New_Category

subcategories:
  Your_New_Category:
    - Subcategory_One
    - Subcategory_Two

tag_mappings:
  your_mapping:
    main: Your_New_Category
    sub: [Subcategory_One]
    keywords: ["keyword1", "keyword2"]

Development

Setup

git clone https://github.com/teyfikoz/Travel_Purpose-City_Tags.git
cd Travel_Purpose-City_Tags
pip install -e ".[dev]"

Running Tests

pytest
pytest --cov=travelpurpose --cov-report=term-missing

Linting

ruff check travelpurpose/
black travelpurpose/

Building for PyPI

python -m build
twine check dist/*
twine upload dist/*

Examples

See examples/ directory for Jupyter notebooks:

  • 01_quickstart.ipynb: Basic usage and API examples
  • 02_training_and_rules.ipynb: Advanced classification and ontology customization

Data Provenance & Ethics

Dataset Card

See DATASET_CARD.md for:

  • Data sources and collection dates
  • Sample sizes and coverage
  • Limitations and biases
  • Update frequency

Ethics & Privacy

  • No PII: We collect no personal information
  • Public Data Only: All sources are publicly accessible
  • ToS Compliance: Strict adherence to platform terms of service
  • Transparency: Full source attribution in tag metadata
  • Caching: Reduces load on source platforms
  • Rate Limiting: Prevents server overload

Citation

If you use TravelPurpose in research, please cite:

@software{travelpurpose2025,
  title = {TravelPurpose: City Travel Purpose Classification Library},
  author = {Travel Purpose Contributors},
  year = {2025},
  url = {https://github.com/teyfikoz/Travel_Purpose-City_Tags}
}

See CITATION.cff for more formats.

License

MIT License - see LICENSE file for details.

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Key areas for contribution:

  • Adding new data sources (must be public and ToS-compliant)
  • Expanding the ontology with new categories
  • Improving classification accuracy
  • Adding support for more languages
  • Documentation improvements

Support

Changelog

See CHANGELOG.md for version history and updates.

Acknowledgments

  • Wikidata and Wikipedia communities for open knowledge bases
  • Travel platforms for providing public data
  • Open source community for excellent Python libraries

Made with ❤️ for the travel and data science communities

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

travelpurpose-2.0.1.tar.gz (72.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

travelpurpose-2.0.1-py3-none-any.whl (74.1 kB view details)

Uploaded Python 3

File details

Details for the file travelpurpose-2.0.1.tar.gz.

File metadata

  • Download URL: travelpurpose-2.0.1.tar.gz
  • Upload date:
  • Size: 72.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for travelpurpose-2.0.1.tar.gz
Algorithm Hash digest
SHA256 72b6beb780d3876e6c7872ef89855d8baccec0850f4201b84ae77efc0fd919ed
MD5 2300a088a47c55c029c6c5644b2ee71e
BLAKE2b-256 02912586206bb19a8cd689b9ec2016310f72c609a365f5b81d4b76b895627aab

See more details on using hashes here.

File details

Details for the file travelpurpose-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: travelpurpose-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 74.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for travelpurpose-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d1041e51a3be56e101f23ee1f9b6999d6e77773b3b7762701ef9a8685a7d237e
MD5 ca64026378f527fa4a3ff9e110bad5a4
BLAKE2b-256 5e6c36cadea01d0bb12b3518fc247d963bca0a6b635536d5e7ea022af40b32c5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page