Skip to main content

A Python application to scrape and manage odds data from OddsPortal website.

Project description

OddsHarvester

Scrape sports betting odds from OddsPortal.com with ease

Extract upcoming & historical odds across 8 sports, 100+ leagues, and dozens of betting markets.
Powered by Playwright browser automation. Output to JSON, CSV, or S3.


PyPI version License: MIT Build Status Scraper Health codecov Python


Quick Start

# Install
pip install oddsharvester

# Or clone & setup with uv
git clone https://github.com/jordantete/OddsHarvester.git && cd OddsHarvester
pip install uv && uv sync

# Scrape upcoming football matches
oddsharvester upcoming -s football -d 20250301 -m 1x2 --headless

# Scrape historical Premier League odds
oddsharvester historic -s football -l england-premier-league --season 2024-2025 -m 1x2 --headless

Features

Feature Description
Upcoming Scrape upcoming matches Fetch odds and event details for upcoming sports matches by date or league
Historic Scrape historical odds Retrieve past odds and match results for any season
Multi-market Advanced parsing Structured data: dates, teams, scores, venues, and per-bookmaker odds
Storage Flexible output JSON, CSV (local), or direct upload to AWS S3
Docker Container-ready Run seamlessly in Docker with environment variable configuration
Proxy Proxy support Route through SOCKS/HTTP proxies for geolocation and anti-blocking

Supported Sports & Markets

Sport Markets
⚽ Football 1x2 btts double_chance draw_no_bet over/under european_handicap asian_handicap
🎾 Tennis match_winner total_sets_over/under total_games_over/under asian_handicap exact_score
🏀 Basketball 1x2 moneyline asian_handicap over/under
🏉 Rugby League 1x2 home_away double_chance draw_no_bet over/under handicap
🏉 Rugby Union 1x2 home_away double_chance draw_no_bet over/under handicap
🏒 Ice Hockey 1x2 home_away double_chance draw_no_bet btts over/under
⚾ Baseball moneyline over/under
🏈 American Football 1x2 moneyline over/under asian_handicap

100+ leagues supported across all sports — Premier League, La Liga, Serie A, NBA, NFL, MLB, NHL, ATP/WTA Grand Slams, and many more.


CLI Usage

OddsHarvester has two main commands: upcoming and historic. They share most options, with a few command-specific ones.

oddsharvester upcoming

Scrape odds for upcoming matches — by date, by league, or by specific match URL.

# By date
oddsharvester upcoming -s football -d 20250301 -m 1x2 --headless

# By league (scrapes all upcoming matches for that league)
oddsharvester upcoming -s football -l england-premier-league -m 1x2,btts --headless

# Multiple leagues
oddsharvester upcoming -s football -l england-premier-league,spain-laliga -m 1x2 --headless

# Specific match URLs
oddsharvester upcoming -s football --match-link "https://www.oddsportal.com/football/..." -m 1x2

# Preview mode (faster — average odds only, no individual bookmakers)
oddsharvester upcoming -s football -d 20250301 -m over_under --preview-only --headless

oddsharvester historic

Scrape historical odds and results for past seasons.

# Single league & season
oddsharvester historic -s football -l england-premier-league --season 2022-2023 -m 1x2 --headless

# Current season
oddsharvester historic -s football -l england-premier-league --season current -m 1x2 --headless

# Limit pagination
oddsharvester historic -s football -l england-premier-league --season 2022-2023 -m 1x2 --max-pages 3 --headless

# Output as CSV
oddsharvester historic -s football -l england-premier-league --season 2024-2025 -m 1x2 -f csv -o premier_league_odds --headless

CLI Options Reference

Core Options

Option Short Description Default
--sport -s Sport to scrape (football, tennis, basketball, etc.) required
--date -d Target date in YYYYMMDD format
--league -l Comma-separated league slugs (e.g. england-premier-league)
--market -m Comma-separated markets (e.g. 1x2,btts)
--match-link Specific match URL (repeatable). Overrides --sport, --date, --league

upcoming only: --date is required unless --league or --match-link is provided. --date and --league can be combined to filter the league's upcoming matches down to a specific calendar day. When combining both, the reference timezone for resolving the date is --timezone if provided, otherwise UTC.

historic only:

Option Description Default
--season Season: YYYY, YYYY-YYYY, or current required
--max-pages Max number of result pages to scrape unlimited

Output Options

Option Short Description Default
--storage local or remote (S3) local
--format -f json or csv json
--output -o Output file path scraped_data

Browser & Scraping Options

Option Short Description Default
--headless Run browser in headless mode False
--concurrency -c Concurrent scraping tasks 3
--request-delay Delay (sec) between match requests 1.0
--user-agent Custom browser user agent
--locale Browser locale (e.g. fr-BE)
--timezone Browser timezone (e.g. Europe/Brussels)

Proxy Options

Option Description
--proxy-url Proxy URL (http://... or socks5://...)
--proxy-user Proxy username
--proxy-pass Proxy password

Tip: For best results, match --locale and --timezone to your proxy's region.

Advanced Options

Option Description Default
--target-bookmaker Filter odds for a specific bookmaker
--odds-history Include historical odds movement per match False
--odds-format Odds display format Decimal Odds
--preview-only Fast mode — average odds only, no bookmaker details False
--bookies-filter Bookmaker filter: all, classic, or crypto all
--period Match period (sport-specific: full-time, halves, etc.) sport default
Preview Mode vs Full Mode
Aspect Full Mode Preview Mode
Speed Slower (interactive) Faster (passive)
Data All submarkets + bookmakers Visible submarkets + avg odds
Bookmakers Individual bookmaker odds Average odds only
Odds History Available Not available
Structure By bookmaker By submarket (avg odds)

Preview mode (--preview-only) is useful for quick exploration, testing data format, or light monitoring with reduced resource usage.


Environment Variables

All CLI options can be set via environment variables — useful for Docker or CI/CD.

View all environment variables
Variable CLI Option Description
OH_SPORT --sport Sport to scrape
OH_LEAGUES --league Comma-separated leagues
OH_MARKETS --market Comma-separated markets
OH_STORAGE --storage Storage type (local/remote)
OH_FORMAT --format Output format (json/csv)
OH_FILE_PATH --output Output file path
OH_HEADLESS --headless Run in headless mode
OH_CONCURRENCY --concurrency Number of concurrent tasks
OH_REQUEST_DELAY --request-delay Delay between requests (sec)
OH_PROXY_URL --proxy-url Proxy server URL
OH_PROXY_USER --proxy-user Proxy username
OH_PROXY_PASS --proxy-pass Proxy password
OH_USER_AGENT --user-agent Custom browser user agent
OH_LOCALE --locale Browser locale
OH_TIMEZONE --timezone Browser timezone ID
export OH_SPORT=football
export OH_HEADLESS=true
export OH_PROXY_URL=http://proxy.example.com:8080

oddsharvester upcoming -d 20250301 -m 1x2

Installation

With pip (from PyPI)

pip install oddsharvester

From source (with uv)

git clone https://github.com/jordantete/OddsHarvester.git
cd OddsHarvester
pip install uv
uv sync
Manual setup (venv + pip or poetry)
python3 -m venv .venv
source .venv/bin/activate    # Unix/macOS
# .venv\Scripts\activate     # Windows

pip install . --use-pep517
# or: poetry install

Verify installation:

oddsharvester --help

Docker

# Build
docker build -t odds-harvester:local --target local-dev .

# Run
docker run --rm odds-harvester:local \
  python3 -m oddsharvester upcoming -s football -d 20250301 -m 1x2 --headless

# Or with environment variables
docker run --rm \
  -e OH_SPORT=football \
  -e OH_HEADLESS=true \
  odds-harvester:local python3 -m oddsharvester upcoming -d 20250301 -m 1x2

Cloud Deployment (AWS Lambda + Serverless)

OddsHarvester can be deployed on AWS Lambda using the Serverless Framework with a Docker image (Playwright exceeds Lambda's 50MB deployment limit).

Setup:

  1. Build the Docker image and push to ECR
  2. Configure serverless.yaml at the project root:
    • Set your AWS region, S3 bucket ARN, and IAM permissions
    • Default function: scanAndStoreOddsPortalDataV2 (2048MB, 360s timeout)
    • Triggers via EventBridge every 2 hours by default
  3. Deploy:
sls deploy

Refer to the Serverless Framework docs for detailed setup instructions.


Contributing

Contributions are welcome! Submit an issue or pull request. Please follow the project's coding standards and include clear descriptions for any changes.

License

MIT License

Disclaimer

This package is intended for educational purposes only. The author is not affiliated with or endorsed by oddsportal.com. Use responsibly and ensure compliance with their terms of service and applicable laws.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oddsharvester-0.2.1.tar.gz (81.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oddsharvester-0.2.1-py3-none-any.whl (89.5 kB view details)

Uploaded Python 3

File details

Details for the file oddsharvester-0.2.1.tar.gz.

File metadata

  • Download URL: oddsharvester-0.2.1.tar.gz
  • Upload date:
  • Size: 81.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for oddsharvester-0.2.1.tar.gz
Algorithm Hash digest
SHA256 e5fead305548149b8ae06bacec769022aefed41e579e348921b5059f9fee6aaa
MD5 ca7f730e90dbb602943243686c6a2e75
BLAKE2b-256 b97a11f140dedd517865423efd51de0298f1044a92688f6ff7b000dff6101987

See more details on using hashes here.

Provenance

The following attestation bundles were made for oddsharvester-0.2.1.tar.gz:

Publisher: release.yml on jordantete/OddsHarvester

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file oddsharvester-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: oddsharvester-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 89.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for oddsharvester-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6b40a07be3d93e3f47be7b84121432ade98c7768cb1cc9ac0cc8ab6cc4cbdd39
MD5 e5e26f17128d3277c54d0c631e0f0a63
BLAKE2b-256 057ef85078ebd7dfb0a3afd9c216ee11791ac3b4990cc44eb4e4756d16ed4dea

See more details on using hashes here.

Provenance

The following attestation bundles were made for oddsharvester-0.2.1-py3-none-any.whl:

Publisher: release.yml on jordantete/OddsHarvester

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page