A Python application to scrape and manage odds data from OddsPortal website.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

OddsHarvester

OddsHarvester is an application designed to scrape and process sports betting odds and match data from oddsportal.com website.

✨ Features

📅 Scrape Upcoming Matches: Fetch odds and event details for upcoming sports matches.
📊 Scrape Historical Odds: Retrieve historical odds and match results for analytical purposes.
🔍 Advanced Parsing: Extract structured data, including match dates, team names, scores, and venue details.
💾 Flexible Storage: Store scraped data in JSON or CSV locally, or upload it directly to a remote S3 bucket.
🐳 Docker Compatibility: Designed to work seamlessly inside Docker containers with minimal setup.
🕵️ Proxy Support: Route web requests through SOCKS/HTTP proxies for enhanced anonymity, geolocation bypass, and anti-blocking measures.

📚 Current Support

OddsHarvester supports a growing number of sports and their associated betting markets. All configurations are managed via dedicated enum and mapping files in the codebase.

✅ Supported Sports & Markets

🏅 Sport	🛒 Supported Markets
⚽ Football	`1x2`, `btts`, `double_chance`, `draw_no_bet`, `over/under`, `european_handicap`, `asian_handicap`
🎾 Tennis	`match_winner`, `total_sets_over/under`, `total_games_over/under`, `asian_handicap`, `exact_score`
🏀 Basketball	`1x2`, `moneyline`, `asian_handicap`, `over/under`
🏉 Rugby League	`1x2`, `home_away`, `double_chance`, `draw_no_bet`, `over/under`, `handicap`
🏉 Rugby Union	`1x2`, `home_away`, `double_chance`, `draw_no_bet`, `over/under`, `handicap`
🏒 Ice Hockey	`1x2`, `home_away`, `double_chance`, `draw_no_bet`, `btts`, `over/under`
⚾ Baseball	`moneyline`, `over/under`
🏈 American Football	`1x2`, `moneyline`, `over/under`, `asian_handicap`

⚙️ Note: Each sport and its markets are declared in enums inside sport_market_constants.py.

🗺️ Leagues & Competitions

Leagues and tournaments are mapped per sport in: sport_league_constants.py

You'll find support for:

🏆 Top Football leagues (Premier League, La Liga, Serie A, etc.)
🎾 Major Tennis tournaments (ATP, WTA, Grand Slams, etc.)
🏀 Global Basketball leagues (NBA, EuroLeague, ACB, etc.)
🏉 Major Rugby League competitions (NRL, Super League, etc.)
🏉 Major Rugby Union competitions (Six Nations, Rugby Championship, Top 14, etc.)
🏒 Major Ice Hockey leagues (NHL, KHL, SHL, Liiga, etc.)
⚾ Major Baseball leagues (MLB, NPB, KBO, etc.)
🏈 American Football leagues (NFL, NCAA, etc.)

🛠️ Local Installation

Clone the repository: Navigate to your desired folder and clone the repository. Then, move into the project directory:
```
git clone https://github.com/jordantete/OddsHarvester.git
cd OddsHarvester
```
Quick Setup with uv:

Use uv, a lightweight package manager, to simplify the setup process. First, install uv with pip, then run the setup:
```
pip install uv
uv sync
```
Manual Setup (Optional):

If you prefer to set up manually, follow these steps:
- Create a virtual environment: Use Python's venv module to create an isolated environment (or virtualenv) for the project. Activate it depending on your operating system:
  - python3 -m venv .venv
  - On Unix/MacOS: source .venv/bin/activate
  - On Windows: .venv\Scripts\activate
- Install dependencies with pip: Use pip with the --use-pep517 flag to install directly from the pyproject.toml file: pip install . --use-pep517.
- Or install dependencies with poetry: If you prefer poetry for dependency management: poetry install
Verify Installation:

Ensure all dependencies are installed and Playwright is set up by running the following command:
```
oddsharvester --help
```
Or using the module directly:
```
python -m oddsharvester --help
```

By following these steps, you should have OddsHarvester set up and ready to use.

⚡ Usage

🔧 CLI Commands

OddsHarvester provides a Command-Line Interface (CLI) to scrape sports betting data from oddsportal.com. Use it to retrieve upcoming match odds, analyze historical data, or store results for further processing.

Quick Reference

# Scrape upcoming matches
oddsharvester upcoming -s football -d 20250301 -m 1x2

# Scrape historical data
oddsharvester historic -s football -l england-premier-league --season 2024-2025 -m 1x2

# Show help
oddsharvester --help
oddsharvester upcoming --help
oddsharvester historic --help

1. Scrape Upcoming Matches

Retrieve odds and event details for upcoming sports matches.

oddsharvester upcoming [OPTIONS]

Options:

Option	Short	Description	Required	Default
`--sport`	`-s`	Sport to scrape (e.g., `football`, `tennis`, `basketball`)	Yes	None
`--date`	`-d`	Date for matches in `YYYYMMDD` format	Yes (unless `--league` or `--match-link`)	None
`--league`	`-l`	Comma-separated leagues (e.g., `england-premier-league`)	No	None
`--market`	`-m`	Comma-separated betting markets (e.g., `1x2,btts`)	No	None
`--storage`		Storage type: `local` or `remote`	No	`local`
`--format`	`-f`	Output format: `json` or `csv`	No	`json`
`--output`	`-o`	Output file path	No	`scraped_data`
`--headless`		Run browser in headless mode	No	`False`
`--concurrency`	`-c`	Number of concurrent scraping tasks	No	`3`
`--request-delay`		Delay in seconds between match requests (with jitter)	No	`1.0`
`--proxy-url`		Proxy URL (e.g., `http://proxy:8080` or `socks5://proxy:1080`)	No	None
`--proxy-user`		Proxy username	No	None
`--proxy-pass`		Proxy password	No	None
`--user-agent`		Custom browser user agent	No	None
`--locale`		Browser locale (e.g., `fr-BE`)	No	None
`--timezone`		Browser timezone ID (e.g., `Europe/Brussels`)	No	None
`--match-link`		Specific match URL(s) to scrape (can be repeated)	No	None
`--target-bookmaker`		Filter for a specific bookmaker	No	None
`--odds-history`		Scrape historical odds movement	No	`False`
`--odds-format`		Odds display format	No	`Decimal Odds`
`--preview-only`		Only scrape visible submarkets (faster, limited data)	No	`False`
`--bookies-filter`		Bookmaker filter: `all`, `classic`, or `crypto`	No	`all`
`--period`		Match period to scrape (sport-specific)	No	Sport default

Important Notes:

If both --league and --date are provided, the scraper will only consider the leagues, meaning all upcoming matches for those leagues will be scraped.
If --match-link is provided, it overrides --sport, --date, and --league.
All match links must belong to the same sport when using --match-link.
For best results, ensure the proxy's region matches the --locale and --timezone settings.

Example Usage:

# Retrieve upcoming football matches for a specific date
oddsharvester upcoming -s football -m 1x2 -d 20250301 --headless

# Scrape English Premier League matches
oddsharvester upcoming -s football -l england-premier-league -m 1x2,btts --headless

# Scrape multiple leagues at once
oddsharvester upcoming -s football -l england-premier-league,spain-laliga -m 1x2 --headless

# Scrape with a proxy
oddsharvester upcoming -s football -d 20250301 -m 1x2 --proxy-url http://proxy:8080 --proxy-user myuser --proxy-pass mypass --headless

# Scrape in preview mode (faster, average odds only)
oddsharvester upcoming -s football -d 20250301 -m over_under --preview-only --headless

# Scrape specific matches using match links
oddsharvester upcoming -s football --match-link "https://www.oddsportal.com/football/..." --match-link "https://www.oddsportal.com/football/..." -m 1x2

2. Scrape Historical Odds

Retrieve historical odds and results for analytical purposes.

oddsharvester historic [OPTIONS]

Options:

Option	Short	Description	Required	Default
`--sport`	`-s`	Sport to scrape (e.g., `football`, `tennis`, `basketball`)	Yes	None
`--season`		Season: `YYYY`, `YYYY-YYYY`, or `current`	Yes	None
`--league`	`-l`	Comma-separated leagues (e.g., `england-premier-league`)	No	None
`--market`	`-m`	Comma-separated betting markets (e.g., `1x2,btts`)	No	None
`--max-pages`		Maximum number of pages to scrape	No	None
`--storage`		Storage type: `local` or `remote`	No	`local`
`--format`	`-f`	Output format: `json` or `csv`	No	`json`
`--output`	`-o`	Output file path	No	`scraped_data`
`--headless`		Run browser in headless mode	No	`False`
`--concurrency`	`-c`	Number of concurrent scraping tasks	No	`3`
`--request-delay`		Delay in seconds between match requests (with jitter)	No	`1.0`
`--proxy-url`		Proxy URL (e.g., `http://proxy:8080` or `socks5://proxy:1080`)	No	None
`--proxy-user`		Proxy username	No	None
`--proxy-pass`		Proxy password	No	None
`--user-agent`		Custom browser user agent	No	None
`--locale`		Browser locale (e.g., `fr-BE`)	No	None
`--timezone`		Browser timezone ID (e.g., `Europe/Brussels`)	No	None
`--match-link`		Specific match URL(s) to scrape (can be repeated)	No	None
`--target-bookmaker`		Filter for a specific bookmaker	No	None
`--odds-history`		Scrape historical odds movement	No	`False`
`--odds-format`		Odds display format	No	`Decimal Odds`
`--preview-only`		Only scrape visible submarkets (faster, limited data)	No	`False`
`--bookies-filter`		Bookmaker filter: `all`, `classic`, or `crypto`	No	`all`
`--period`		Match period to scrape (sport-specific)	No	Sport default

Example Usage:

# Retrieve historical odds for the Premier League 2022-2023 season
oddsharvester historic -s football -l england-premier-league --season 2022-2023 -m 1x2 --headless

# Retrieve historical odds for multiple leagues
oddsharvester historic -s football -l england-premier-league,spain-laliga --season 2022-2023 -m 1x2 --headless

# Retrieve historical odds for the current season
oddsharvester historic -s football -l england-premier-league --season current -m 1x2 --headless

# Retrieve historical MLB 2022 season data
oddsharvester historic -s baseball -l usa-mlb --season 2022 -m moneyline --headless

# Scrape only 3 pages of historical data
oddsharvester historic -s football -l england-premier-league --season 2022-2023 -m 1x2 --max-pages 3 --headless

# Save output to CSV format
oddsharvester historic -s football -l england-premier-league --season 2024-2025 -m 1x2 -f csv -o premier_league_odds --headless

Preview Mode

The --preview-only flag enables a faster scraping mode that extracts only average odds from visible submarkets without loading individual bookmaker details. This mode is useful for:

Quick exploration of available submarkets and their average odds
Testing data structure and format
Light monitoring with reduced resource usage

Preview Mode vs Full Mode:

Aspect	Full Mode	Preview Mode
Speed	Slower (interactive)	Faster (passive)
Data	All submarkets + bookmakers	Visible submarkets + avg odds
Bookmakers	Individual bookmaker odds	Average odds only
Odds History	Available	Not available
Structure	By bookmaker	By submarket (avg odds)

🌐 Environment Variables

All CLI options can also be configured via environment variables. This is useful for Docker deployments or CI/CD pipelines.

Environment Variable	CLI Option	Description
`OH_SPORT`	`--sport`	Sport to scrape
`OH_LEAGUES`	`--league`	Comma-separated leagues
`OH_MARKETS`	`--market`	Comma-separated markets
`OH_STORAGE`	`--storage`	Storage type (local/remote)
`OH_FORMAT`	`--format`	Output format (json/csv)
`OH_FILE_PATH`	`--output`	Output file path
`OH_HEADLESS`	`--headless`	Run in headless mode
`OH_CONCURRENCY`	`--concurrency`	Number of concurrent tasks
`OH_REQUEST_DELAY`	`--request-delay`	Delay between match requests (sec)
`OH_PROXY_URL`	`--proxy-url`	Proxy server URL
`OH_PROXY_USER`	`--proxy-user`	Proxy username
`OH_PROXY_PASS`	`--proxy-pass`	Proxy password
`OH_USER_AGENT`	`--user-agent`	Custom browser user agent
`OH_LOCALE`	`--locale`	Browser locale
`OH_TIMEZONE`	`--timezone`	Browser timezone ID

Example:

export OH_SPORT=football
export OH_HEADLESS=true
export OH_PROXY_URL=http://proxy.example.com:8080

oddsharvester upcoming -d 20250301 -m 1x2

🐳 Running Inside a Docker Container

OddsHarvester is compatible with Docker, allowing you to run the application seamlessly in a containerized environment.

Steps to Run with Docker:

Ensure Docker is Installed Make sure Docker is installed and running on your system. Visit Docker's official website for installation instructions specific to your operating system.
Build the Docker Image Navigate to the project's root directory, where the Dockerfile is located. Build the Docker image using the appropriate Docker build command. Assign a name to the image, such as odds-harvester: docker build -t odds-harvester:local --target local-dev .

Run the Container Start a Docker container based on the built image. Map the necessary ports if required and specify any volumes to persist data. Pass any CLI arguments as part of the Docker run command:

docker run --rm odds-harvester:local python3 -m oddsharvester upcoming -s football -d 20250301 -m 1x2 -o output.json --headless

Or using environment variables:

docker run --rm \
  -e OH_SPORT=football \
  -e OH_HEADLESS=true \
  odds-harvester:local python3 -m oddsharvester upcoming -d 20250301 -m 1x2

Interactive Mode for Debugging If you need to debug or run commands interactively: docker run --rm -it odds-harvester:latest /bin/bash

Tips:

Volume Mapping: Use volume mapping to store logs or output data on the host machine.
Container Reusability: Assign a unique container name to avoid conflicts when running multiple instances.

☁️ Cloud Deployment

OddsHarvester can also be deployed on a cloud provider using the Serverless Framework, with a Docker image to ensure compatibility with AWS Lambda (Dockerfile will need to be tweaked if you want to deploy on a different cloud provider).

Why Use a Docker Image?

AWS Lambda's Deployment Size Limit: AWS Lambda has a hard limit of 50MB for direct deployment packages, which includes code, dependencies, and assets. Playwright and its browser dependencies far exceed this limit.
Playwright's Incompatibility with Lambda Layers: Playwright cannot be installed as an AWS Lambda layer because:
- Its browser dependencies require system libraries that are unavailable in Lambda's standard runtime environment.
- Packaging these libraries within Lambda layers would exceed the layer size limit.
Solution: Using a Docker image solves these limitations by bundling the entire runtime environment, including Playwright, its browsers, and all required libraries, into a single package. This ensures a consistent and compatible execution environment.

Serverless Framework Setup:

Serverless Configuration: The application includes a serverless.yaml file located at the root of the project. This file defines the deployment configuration for a serverless environment. Users can customize the configuration as needed, including:
- Provider: Specify the cloud provider (e.g., AWS).
- Region: Set the desired deployment region (e.g., eu-west-3).
- Resources: Update the S3 bucket details or permissions as required.
Docker Integration: The app uses a Docker image (playwright_python_arm64) to ensure compatibility with the serverless architecture. The Dockerfile is already included in the project and configured in serverless.yaml. You'll need to build the image locally (see section above) and push the Docker image to ECR.
Permissions: By default, the app is configured with IAM roles to:
- Upload (PutObject), retrieve (GetObject), and delete (DeleteObject) files from an S3 bucket. Update the Resource field in serverless.yaml with the ARN of your S3 bucket.
Function Details:
- Function Name: scanAndStoreOddsPortalDataV2
- Memory Size: 2048 MB
- Timeout: 360 seconds
- Event Trigger: Runs automatically every 2 hours (rate(2 hours)) via EventBridge.

Customizing Your Configuration: To tailor the serverless deployment for your needs:

Open the serverless.yaml file in the root directory.
Update the relevant fields:
- S3 bucket ARN in the IAM policy.
- Scheduling rate for the EventBridge trigger.
- Resource limits (e.g., memory size or timeout).

Deploying to your preferred Cloud provider:

Install the Serverless Framework:
- Follow the installation guide at Serverless Framework.
Deploy the application:
- Use the sls deploy command to deploy the app to your cloud provider.
Verify the deployment:
- Confirm that the function is scheduled correctly and check logs or S3 outputs.

🤝 Contributing

Contributions are welcome! If you have ideas, improvements, or bug fixes, feel free to submit an issue or a pull request. Please ensure that your contributions follow the project's coding standards and include clear descriptions for any changes.

☕ Donations

If you find this project useful and would like to support its development, consider buying me a coffee! Your support helps keep this project maintained and improved.

📜 License

This project is licensed under the MIT License - see the LICENSE file for more details.

💬 Feedback

Have any questions or feedback? Feel free to reach out via the issues tab on GitHub. We'd love to hear from you!

❗ Disclaimer

This package is intended for educational purposes only and not for any commercial use in any way. The author is not affiliated with or endorsed by the oddsportal.com website. Use this application responsibly and ensure compliance with the terms of service of oddsportal.com and any applicable laws in your jurisdiction.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jordantete

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.0

May 20, 2026

0.2.1

May 15, 2026

This version

0.2.0

Mar 10, 2026

0.1.0

Feb 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oddsharvester-0.2.0.tar.gz (81.8 kB view details)

Uploaded Mar 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

oddsharvester-0.2.0-py3-none-any.whl (85.1 kB view details)

Uploaded Mar 10, 2026 Python 3

File details

Details for the file oddsharvester-0.2.0.tar.gz.

File metadata

Download URL: oddsharvester-0.2.0.tar.gz
Upload date: Mar 10, 2026
Size: 81.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for oddsharvester-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`a8ce86605399959db59636edf67be95a924c755ee5df468afc3caff7062c5675`
MD5	`560ed7b1267b2ddf2ed0fda637819777`
BLAKE2b-256	`cf24f31d94dabb49198ae897cdd1e8e5d9a62ddfebfd5eca8f13ed58c59ccd06`

See more details on using hashes here.

Provenance

The following attestation bundles were made for oddsharvester-0.2.0.tar.gz:

Publisher: release.yml on jordantete/OddsHarvester

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: oddsharvester-0.2.0.tar.gz
- Subject digest: a8ce86605399959db59636edf67be95a924c755ee5df468afc3caff7062c5675
- Sigstore transparency entry: 1075725094
- Sigstore integration time: Mar 10, 2026
Source repository:
- Permalink: jordantete/OddsHarvester@96c0816ae78aad297ae7f6bd956d85758eddf816
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/jordantete
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@96c0816ae78aad297ae7f6bd956d85758eddf816
- Trigger Event: push

File details

Details for the file oddsharvester-0.2.0-py3-none-any.whl.

File metadata

Download URL: oddsharvester-0.2.0-py3-none-any.whl
Upload date: Mar 10, 2026
Size: 85.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for oddsharvester-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f215a6941f870ae25cd937374bb0b108ea2b7673859ba3d92a448c7650c52846`
MD5	`141ff4948f9b008d93ce52d0513b8268`
BLAKE2b-256	`64e37aaf3e0821e6afc999f6de02753eda93f1994a365577aa01d3098b1bf305`

See more details on using hashes here.

Provenance

The following attestation bundles were made for oddsharvester-0.2.0-py3-none-any.whl:

Publisher: release.yml on jordantete/OddsHarvester

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: oddsharvester-0.2.0-py3-none-any.whl
- Subject digest: f215a6941f870ae25cd937374bb0b108ea2b7673859ba3d92a448c7650c52846
- Sigstore transparency entry: 1075725131
- Sigstore integration time: Mar 10, 2026
Source repository:
- Permalink: jordantete/OddsHarvester@96c0816ae78aad297ae7f6bd956d85758eddf816
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/jordantete
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@96c0816ae78aad297ae7f6bd956d85758eddf816
- Trigger Event: push

oddsharvester 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

OddsHarvester

📖 Table of Contents

✨ Features

📚 Current Support

✅ Supported Sports & Markets

🗺️ Leagues & Competitions

🛠️ Local Installation

⚡ Usage

🔧 CLI Commands

Quick Reference

1. Scrape Upcoming Matches

2. Scrape Historical Odds

Preview Mode

🌐 Environment Variables

🐳 Running Inside a Docker Container

☁️ Cloud Deployment

🤝 Contributing

☕ Donations

📜 License

💬 Feedback

❗ Disclaimer

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance