A Python library for scraping ITA Matrix travel website using Playwright

These details have not been verified by PyPI

Project links

Project description

ITA Scrapper

A powerful Python library for scraping ITA Matrix flight data using Playwright. Get flight prices, schedules, and travel information programmatically with a clean, async API.

✨ Features

🛫 Flight Search: Search flights between any airports worldwide
📅 Flexible Dates: Support for one-way, round-trip, and multi-city searches
💰 Price Parsing: Parse and normalize flight prices from various formats
⏱️ Duration Handling: Parse flight durations and format them consistently
🌍 Airport Codes: Validate and normalize IATA/ICAO airport codes
🎯 Type Safety: Full Pydantic model support with type hints
⚡ Async Support: Built with async/await for high performance
� Tested: Comprehensive test suite with 95%+ coverage
🖥️ CLI Interface: Command-line tool for quick searches
🔧 MCP Server: Model Context Protocol server for AI integration

📦 Installation

pip install ita-scrapper

For development with all extras:

pip install ita-scrapper[dev,mcp]

Install Playwright browsers:

playwright install chromium

🚀 Quick Start

Python API

import asyncio
from datetime import date, timedelta
from ita_scrapper import ITAScrapper, CabinClass

async def search_flights():
    async with ITAScrapper(headless=True) as scrapper:
        # Search for flights
        results = await scrapper.search_flights(
            origin="JFK",
            destination="LAX", 
            departure_date=date.today() + timedelta(days=30),
            return_date=date.today() + timedelta(days=37),
            adults=2,
            cabin_class=CabinClass.BUSINESS
        )
        
        # Print results
        for i, flight in enumerate(results.flights, 1):
            print(f"Flight {i}:")
            print(f"  Price: ${flight.price}")
            print(f"  Duration: {flight.duration}")
            print(f"  Stops: {flight.stops}")
            print(f"  Airline: {flight.airline}")
            print()

# Run the search
asyncio.run(search_flights())

Command Line Interface

# Search for flights
ita-scrapper search --origin JFK --destination LAX \
    --departure-date 2024-08-15 --return-date 2024-08-22 \
    --adults 2 --cabin-class BUSINESS

# Parse flight data
ita-scrapper parse "2h 30m" --type duration
ita-scrapper parse "$1,234.56" --type price  
ita-scrapper parse "14:30" --type time --reference-date 2024-08-15

# Get help
ita-scrapper --help

📚 Documentation

Quick Links

📖 API Documentation - Complete API reference with examples
🔧 Developer Guide - Architecture and extension guide
🚨 Troubleshooting - Common issues and solutions
📊 Project Summary - High-level project overview

API Documentation

Comprehensive API documentation is available in the docs/api.md file, covering:

Core Classes: ITAScrapper, ITAMatrixParser
Data Models: Flight, SearchParams, FlightResult
Utility Functions: Price parsing, duration formatting, validation
Exception Handling: Complete error handling strategies
Best Practices: Recommended usage patterns

Developer Guide

For developers wanting to extend or contribute to ITA Scrapper, see docs/developer-guide.md:

Architecture Overview: Component design and data flow
Parser Architecture: Multi-strategy parsing system
Browser Automation: Playwright integration and anti-detection
Extension Points: Adding new parsers and data models
Debugging Guide: Tools and techniques for troubleshooting
Performance Optimization: Memory and speed optimization

Troubleshooting

Having issues? Check docs/troubleshooting.md for solutions to:

Installation Issues: Dependencies and browser setup
Website Access: Blocking, CAPTCHAs, and rate limiting
Parsing Problems: Data extraction and validation issues
Performance: Memory usage and speed optimization
Development Setup: Environment configuration and debugging

🚀 Quick Start

Core Classes

ITAScrapper

Main scraper class for flight searches.

class ITAScrapper:
    def __init__(self, headless: bool = True, timeout: int = 30000):
        """Initialize the scrapper."""
        
    async def search_flights(
        self,
        origin: str,
        destination: str,
        departure_date: date,
        return_date: Optional[date] = None,
        adults: int = 1,
        children: int = 0,
        infants: int = 0,
        cabin_class: CabinClass = CabinClass.ECONOMY
    ) -> FlightResult:
        """Search for flights."""

Models

from ita_scrapper import (
    Flight,           # Individual flight details
    FlightResult,     # Search results container
    SearchParams,     # Search parameters
    CabinClass,       # Enum for cabin classes
    TripType,         # Enum for trip types
    Airport,          # Airport information
)

Utility Functions

from ita_scrapper import (
    parse_price,           # Parse price strings
    parse_duration,        # Parse duration strings
    parse_time,            # Parse time strings  
    validate_airport_code, # Validate airport codes
    format_duration,       # Format durations
    is_valid_date_range,   # Validate date ranges
)

# Examples
price = parse_price("$1,234.56")  # Returns Decimal('1234.56')
duration = parse_duration("2h 30m")  # Returns 150 (minutes)
code = validate_airport_code("jfk")  # Returns "JFK"

🎯 Advanced Usage

Context Manager

# Recommended: Use as context manager
async with ITAScrapper(headless=True) as scrapper:
    results = await scrapper.search_flights(...)

# Manual management
scrapper = ITAScrapper()
await scrapper.start()
try:
    results = await scrapper.search_flights(...)
finally:
    await scrapper.close()

Error Handling

from ita_scrapper import ITAScrapperError, NavigationError, TimeoutError

try:
    async with ITAScrapper() as scrapper:
        results = await scrapper.search_flights(...)
except NavigationError:
    print("Failed to navigate to search page")
except TimeoutError:
    print("Search timed out")
except ITAScrapperError as e:
    print(f"General error: {e}")

Custom Configuration

scrapper = ITAScrapper(
    headless=False,        # Show browser window
    timeout=60000,         # 60 second timeout
)

🧪 Testing

Run the test suite:

# All tests
pytest

# Unit tests only (fast)  
pytest -m "not slow"

# Integration tests (slow, requires browser)
pytest -m slow

# With coverage
pytest --cov=src/ita_scrapper --cov-report=html

🔧 MCP Server

Use ITA Scrapper as a Model Context Protocol server:

# Install MCP support
pip install ita-scrapper[mcp]

# Create MCP server (see examples/mcp_integration.py)
from ita_scrapper.mcp import create_mcp_server
server = create_mcp_server()

Configure in Claude Desktop:

{
  "mcpServers": {
    "ita-scrapper": {
      "command": "python",
      "args": ["/path/to/ita_scrapper_mcp_server.py"]
    }
  }
}

🌟 Examples

Check out the /examples directory for more usage examples:

basic_usage.py - Simple flight search
demo_usage.py - Interactive demo
matrix_examples.py - Advanced search patterns
mcp_integration.py - MCP server setup
test_real_sites.py - Real-world testing

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Development Setup

# Clone repository
git clone https://github.com/yourusername/ita-scrapper.git
cd ita-scrapper

# Install with uv (recommended)
uv sync --all-extras

# Install Playwright browsers
uv run playwright install

# Run tests
uv run pytest

# Run linting
uv run ruff check .
uv run ruff format .

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

⚠️ Disclaimer

This tool is for educational and research purposes only. Please respect the terms of service of any websites you scrape and be mindful of rate limits. The authors are not responsible for any misuse of this software.

🙋‍♂️ Support

📊 Stats

Language: Python 3.10+
Framework: Playwright + Pydantic
Test Coverage: 95%+
Dependencies: Minimal, well-maintained
Performance: Async/await optimized

Made with ❤️ for travel enthusiasts and developers!

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.3

Aug 15, 2025

0.1.2

Aug 15, 2025

0.1.1

Jul 27, 2025

0.1.0

Jul 22, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ita_scrapper-0.1.3.tar.gz (374.9 kB view details)

Uploaded Aug 15, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ita_scrapper-0.1.3-py3-none-any.whl (273.2 kB view details)

Uploaded Aug 15, 2025 Python 3

File details

Details for the file ita_scrapper-0.1.3.tar.gz.

File metadata

Download URL: ita_scrapper-0.1.3.tar.gz
Upload date: Aug 15, 2025
Size: 374.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ita_scrapper-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`fb216f9160892f7c2b45bb3385a3b820bbc918bfa1217fd7f62280a327211c26`
MD5	`c2235ea8474d7b22287b5ad03f4c50af`
BLAKE2b-256	`5e4ceb98e8072fcd3b897a71c16f584ca869ad71cbce2be446eef73a5a0708bb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ita_scrapper-0.1.3.tar.gz:

Publisher: publish.yml on problemxl/ita-scrapper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ita_scrapper-0.1.3.tar.gz
- Subject digest: fb216f9160892f7c2b45bb3385a3b820bbc918bfa1217fd7f62280a327211c26
- Sigstore transparency entry: 399122728
- Sigstore integration time: Aug 15, 2025
Source repository:
- Permalink: problemxl/ita-scrapper@6a3d308960a695d013be3f317d9630b18ce3b5e0
- Branch / Tag: refs/heads/main
- Owner: https://github.com/problemxl
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6a3d308960a695d013be3f317d9630b18ce3b5e0
- Trigger Event: workflow_dispatch

File details

Details for the file ita_scrapper-0.1.3-py3-none-any.whl.

File metadata

Download URL: ita_scrapper-0.1.3-py3-none-any.whl
Upload date: Aug 15, 2025
Size: 273.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ita_scrapper-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7ec1b98f21a204de34f2321116b2e2fdc1905f48e26fcdffa8cfced7b2629563`
MD5	`55131afae1c3469080a044e3bd549e0f`
BLAKE2b-256	`b28837104cd59bbc52348a76d041ca586f4c398730fb27340469a121343eba90`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ita_scrapper-0.1.3-py3-none-any.whl:

Publisher: publish.yml on problemxl/ita-scrapper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ita_scrapper-0.1.3-py3-none-any.whl
- Subject digest: 7ec1b98f21a204de34f2321116b2e2fdc1905f48e26fcdffa8cfced7b2629563
- Sigstore transparency entry: 399122760
- Sigstore integration time: Aug 15, 2025
Source repository:
- Permalink: problemxl/ita-scrapper@6a3d308960a695d013be3f317d9630b18ce3b5e0
- Branch / Tag: refs/heads/main
- Owner: https://github.com/problemxl
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6a3d308960a695d013be3f317d9630b18ce3b5e0
- Trigger Event: workflow_dispatch

ita-scrapper 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ITA Scrapper

✨ Features

📦 Installation

Install Playwright browsers:

🚀 Quick Start

Python API

Command Line Interface

📚 Documentation

Quick Links

API Documentation

Developer Guide

Troubleshooting

🚀 Quick Start

Core Classes

ITAScrapper

Models

Utility Functions

🎯 Advanced Usage

Context Manager

Error Handling

Custom Configuration

🧪 Testing

🔧 MCP Server

🌟 Examples

🤝 Contributing

Development Setup

📄 License

⚠️ Disclaimer

🙋‍♂️ Support

📊 Stats

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance