Skip to main content

Production-grade library for classifying world cities by travel purpose using multi-source data

Project description

TravelPurpose

PyPI version Python 3.10+ License: MIT

A production-grade Python library for classifying world cities by travel purpose using multi-source data from public travel platforms and knowledge bases.

Features

  • Multi-Label Classification: Cities can have multiple travel purposes (e.g., Business + Culture + Transit)
  • Rich Ontology: 12 main categories and 70+ subcategories covering all travel purposes
  • Multi-Source Data: Integrates data from Wikidata, Booking.com, Agoda, Trivago, Kayak, Trip.com, and Skyscanner
  • Hybrid Classifier: Combines rule-based and embedding-based approaches with confidence scoring
  • Python API & CLI: Easy-to-use programmatic and command-line interfaces
  • Ethical Data Collection: Fully compliant with ToS, respects robots.txt, implements rate limiting
  • Production Ready: Comprehensive tests, CI/CD, type hints, logging, caching

Installation

pip install travelpurpose

Quick Start

Python API

from travelpurpose import predict_purpose, tags

# Predict travel purposes for a city
result = predict_purpose("Istanbul")
print(result)
# {
#     'main': ['Culture_Heritage', 'Transit_Gateway', 'Leisure'],
#     'sub': ['UNESCO_Site', 'Old_Town', 'Mega_Air_Hub', 'Gastronomy'],
#     'confidence': 0.86
# }

# Get raw tags from all sources
city_tags = tags("Antalya")
print(city_tags[:3])
# [
#     {'tag': 'beachfront', 'source': 'booking', 'url': '...', 'ts': '...'},
#     {'tag': 'resort', 'source': 'agoda', 'url': '...'},
#     {'tag': 'all-inclusive', 'source': 'trivago', 'url': '...'}
# ]

Command Line

# Predict purposes for a city
tpurpose predict "Paris"

# Show raw tags
tpurpose show-tags "Dubai" --limit 20

# Search for cities
tpurpose find "turkey"

# Rebuild dataset (requires network access)
tpurpose rebuild --sample 100 --verbose

Travel Purpose Ontology

Main Categories (12)

  • Business: Finance hubs, tech centers, MICE destinations
  • Leisure: City breaks, luxury, shopping, gastronomy
  • Culture_Heritage: UNESCO sites, museums, old towns, architecture
  • Beach_Resort: Beachfront, islands, diving, all-inclusive
  • Adventure_Nature: Trekking, safari, desert, extreme sports
  • Family: Theme parks, zoos, safe cities, kid-friendly
  • Medical_Health: Medical tourism, wellness, spa, rehabilitation
  • Religious_Pilgrimage: Islamic, Christian, Buddhist, Hindu pilgrimage sites
  • Winter_Snow: Ski resorts, winter sports, aurora viewing
  • Nightlife_Entertainment: Party districts, casinos, music festivals
  • Transit_Gateway: Major airport hubs, connecting destinations
  • Seaman_Crew: Crew change ports, maritime facilities

Subcategories (70+)

Each main category has 4-9 specialized subcategories. See travelpurpose/ontology/ontology.yaml for the complete taxonomy.

Data Sources

All data collection is public, ToS-compliant, and ethical:

Knowledge Bases

  • Wikidata: Canonical city data, coordinates, population, UNESCO sites
  • Wikipedia: City categories and cultural information

Travel Platforms (Public Data Only)

  • Booking.com: Public structured data (JSON-LD), meta tags, city guides
  • Agoda: Public landing pages, sitemaps, accommodation types
  • Trivago: Public city pages, district information
  • Kayak: Public city guides, travel information
  • Trip.com: Public destination pages, attractions
  • Skyscanner: Public autocomplete API for city normalization

Compliance Features

  • Respects robots.txt
  • Rate limiting (configurable, default 1.5s between requests)
  • HTTP caching (24-hour TTL)
  • Exponential backoff for retries
  • No authentication, logins, or private APIs
  • User-Agent identification
  • Graceful degradation when sources are unavailable

Architecture

Data Pipeline

python scripts/pipeline.py --min-population 100000 --sample 50

The pipeline:

  1. Loads NBD.xlsx (if available) with existing city classifications
  2. Fetches canonical city data from Wikidata (cities >100K population)
  3. Harvests public tags from all sources
  4. Normalizes tags to English, handles Unicode city names
  5. Maps tags to ontology using fuzzy matching
  6. Merges with NBD purposes (if available)
  7. Classifies using hybrid rule-based + embedding approach
  8. Calculates confidence scores
  9. Exports to travelpurpose/data/cities.{parquet,json}

Classifier Design

Hybrid Approach:

  1. Rule-Based (deterministic): Strong tags directly map to categories
  2. Tag Aggregation: Weighted voting from multiple sources
  3. Confidence Calibration: Based on data quality and agreement

Source Weights:

  • Wikidata/UNESCO: 1.5-2.0x (high authority)
  • Booking.com/Agoda: 1.0x (standard)
  • Trivago/Kayak/Trip.com: 0.9x
  • Evidence type boosts: JSON-LD (1.2x), Meta (1.0x), Headings (0.8x)

Configuration

Rate Limiting

from travelpurpose.utils.harvest import HarvestConfig

config = HarvestConfig(
    rate_limit=2.0,  # 2 seconds between requests
    timeout=15,
    max_retries=3,
    cache_ttl=86400,  # 24 hours
)

Extending the Ontology

Edit travelpurpose/ontology/ontology.yaml:

main_categories:
  - Your_New_Category

subcategories:
  Your_New_Category:
    - Subcategory_One
    - Subcategory_Two

tag_mappings:
  your_mapping:
    main: Your_New_Category
    sub: [Subcategory_One]
    keywords: ["keyword1", "keyword2"]

Development

Setup

git clone https://github.com/teyfikoz/Travel_Purpose-City_Tags.git
cd Travel_Purpose-City_Tags
pip install -e ".[dev]"

Running Tests

pytest
pytest --cov=travelpurpose --cov-report=term-missing

Linting

ruff check travelpurpose/
black travelpurpose/

Building for PyPI

python -m build
twine check dist/*
twine upload dist/*

Examples

See examples/ directory for Jupyter notebooks:

  • 01_quickstart.ipynb: Basic usage and API examples
  • 02_training_and_rules.ipynb: Advanced classification and ontology customization

Data Provenance & Ethics

Dataset Card

See DATASET_CARD.md for:

  • Data sources and collection dates
  • Sample sizes and coverage
  • Limitations and biases
  • Update frequency

Ethics & Privacy

  • No PII: We collect no personal information
  • Public Data Only: All sources are publicly accessible
  • ToS Compliance: Strict adherence to platform terms of service
  • Transparency: Full source attribution in tag metadata
  • Caching: Reduces load on source platforms
  • Rate Limiting: Prevents server overload

Citation

If you use TravelPurpose in research, please cite:

@software{travelpurpose2025,
  title = {TravelPurpose: City Travel Purpose Classification Library},
  author = {Travel Purpose Contributors},
  year = {2025},
  url = {https://github.com/teyfikoz/Travel_Purpose-City_Tags}
}

See CITATION.cff for more formats.

License

MIT License - see LICENSE file for details.

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Key areas for contribution:

  • Adding new data sources (must be public and ToS-compliant)
  • Expanding the ontology with new categories
  • Improving classification accuracy
  • Adding support for more languages
  • Documentation improvements

Support

Changelog

See CHANGELOG.md for version history and updates.

Acknowledgments

  • Wikidata and Wikipedia communities for open knowledge bases
  • Travel platforms for providing public data
  • Open source community for excellent Python libraries

Made with ❤️ for the travel and data science communities

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

travelpurpose-0.2.4.tar.gz (66.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

travelpurpose-0.2.4-py3-none-any.whl (68.1 kB view details)

Uploaded Python 3

File details

Details for the file travelpurpose-0.2.4.tar.gz.

File metadata

  • Download URL: travelpurpose-0.2.4.tar.gz
  • Upload date:
  • Size: 66.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for travelpurpose-0.2.4.tar.gz
Algorithm Hash digest
SHA256 bfd1f7fb7792ac99fc264193adcfc70af25d9d8ac38e8662ff11f6cc66b53a45
MD5 aa4518192f6340be928bee150cedf26c
BLAKE2b-256 9c2b11c37b2eb8f212d620951c33ed77426eb0902a580e54b7795fd3213cd441

See more details on using hashes here.

Provenance

The following attestation bundles were made for travelpurpose-0.2.4.tar.gz:

Publisher: publish.yml on teyfikoz/Travel_Purpose-City_Tags

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file travelpurpose-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: travelpurpose-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 68.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for travelpurpose-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 c0f43f82fe1070cb320840068f68b828ffba669296899e2875d132554c1cb919
MD5 1832a2e509b30bb5858110798682383a
BLAKE2b-256 1e5845c925d20e72ed17917a8bdb816aa05d4f4fd50171e252f6c0a3a9f45996

See more details on using hashes here.

Provenance

The following attestation bundles were made for travelpurpose-0.2.4-py3-none-any.whl:

Publisher: publish.yml on teyfikoz/Travel_Purpose-City_Tags

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page