Python package for Basketball Reference that gathers data by scraping the website

These details have not been verified by PyPI

Project description

basketball-reference-webscrapper

basketball-reference-webscrapper is a Python package for fetching NBA games data from two sources:

Basketball Reference website (web scraping)
NBA Stats API (official API via nba_api package)

Features

✅ Web scrapes NBA gamelogs, schedules, and player attributes from Basketball Reference
✅ Fetches data directly from official NBA Stats API (faster, but local-only)
✅ Validates user inputs to ensure data accuracy
✅ Handles team-specific data filtering (single team, multiple teams, or all teams)
✅ Returns data as pandas DataFrames
✅ Consistent interface across both data sources

Installation

pip install basketball-reference-webscrapper

Dependencies:

pandas, beautifulsoup4, requests - for web scraping
nba-api - for NBA API access (included automatically)

Usage

Option 1: Basketball Reference (Web Scraping)

Best for: Production environments, cloud deployments, historical data (1947-present)

from basketball_reference_webscrapper.data_models.feature_model import FeatureIn
from basketball_reference_webscrapper.webscrapping_basketball_reference import WebScrapBasketballReference

# Create feature object
feature = FeatureIn(
    data_type='gamelog',  # 'gamelog', 'schedule', or 'player_attributes'
    season=2023,
    team='BOS'  # 'all', 'BOS', or ['BOS', 'LAL']
)

# Fetch data
scraper = WebScrapBasketballReference(feature_object=feature)
data = scraper.webscrappe_nba_games_data()
print(data.head())

Option 2: NBA Stats API

Best for: Local development, faster data retrieval, recent seasons (2000-present)

from basketball_reference_webscrapper.data_models.feature_model import FeatureIn
from basketball_reference_webscrapper.web_scrap_nba_api import WebScrapNBAApi

# Create feature object
feature = FeatureIn(
    data_type='gamelog',  # 'gamelog' or 'schedule'
    season=2023,
    team='BOS'  # 'all', 'BOS', or ['BOS', 'LAL']
)

# Fetch data
scraper = WebScrapNBAApi(feature_object=feature)
data = scraper.fetch_nba_api_data()
print(data.head())

⚠️ Important: NBA API blocks cloud providers (AWS, Heroku, GCP, etc.). Use locally only.

Comparison: Which Data Source to Use?

Feature	NBA API	Basketball Reference
Speed	⚡ Fast (~1-2s/team)	🐌 Slow (~20-30s/team)
Cloud-friendly	❌ No (blocks cloud IPs)	✅ Yes
Historical data	2000-present	1947-present
Opponent stats	❌ Not included	✅ Complete
Player attributes	❌ Not supported	✅ Supported
Reliability	High (official API)	Medium (web scraping)

Recommendation:

Development/Analysis (local): Use NBA API for speed
Production/Cloud: Use Basketball Reference for reliability
Historical research: Use Basketball Reference
Need opponent stats: Use Basketball Reference

Supported Data Types

Basketball Reference

gamelog - Game-by-game team statistics
schedule - Team schedule and results
player_attributes - Player roster information

NBA API

gamelog - Game-by-game team statistics (no opponent stats)
schedule - Team schedule and results (no pts_opp)

Input Validation

Both scrapers validate inputs:

Data Type: Must be valid for the chosen scraper
Season: Must be integer ≥ 2000 for NBA API, ≥ 1947 for Basketball Reference
Team: 'all', valid team abbreviation (e.g., 'BOS'), or list of abbreviations (e.g., ['BOS', 'LAL'])

Valid Team Abbreviations

ATL, BOS, BRK, CHA, CHI, CLE, DAL, DEN, DET, GSW, HOU, IND,
LAC, LAL, MEM, MIA, MIL, MIN, NOP, NYK, OKC, ORL, PHI, PHO,
POR, SAC, SAS, TOR, UTA, WAS

Examples

Example 1: Fetch Single Team Gamelog (Basketball Reference)

from basketball_reference_webscrapper.data_models.feature_model import FeatureIn
from basketball_reference_webscrapper.webscrapping_basketball_reference import WebScrapBasketballReference

feature = FeatureIn(data_type='gamelog', season=2023, team='BOS')
scraper = WebScrapBasketballReference(feature_object=feature)
data = scraper.webscrappe_nba_games_data()

print(f"Fetched {len(data)} games for Boston Celtics")
print(data[['game_date', 'opp', 'results', 'pts_tm', 'pts_opp']].head())

Example 2: Fetch Multiple Teams Schedule (NBA API)

from basketball_reference_webscrapper.data_models.feature_model import FeatureIn
from basketball_reference_webscrapper.web_scrap_nba_api import WebScrapNBAApi

feature = FeatureIn(data_type='schedule', season=2023, team=['LAL', 'GSW'])
scraper = WebScrapNBAApi(feature_object=feature)
data = scraper.fetch_nba_api_data()

print(f"Teams: {data['tm'].unique()}")
print(data[['game_date', 'opponent', 'w_l', 'pts_tm']].head())

Example 3: Fetch All Teams (use with caution)

# This will take several minutes and make 30+ requests
feature = FeatureIn(data_type='gamelog', season=2023, team='all')

# Choose your scraper based on environment
# scraper = WebScrapNBAApi(feature_object=feature)  # Local only
scraper = WebScrapBasketballReference(feature_object=feature)  # Works anywhere

data = scraper.webscrappe_nba_games_data()
print(f"Fetched data for {data['tm'].nunique()} teams")

Data Engineering Use

This package is designed for data engineering pipelines:

Clean data structure: Only returns actual data from source (no empty placeholders)
Consistent schema: Same column names across data sources where applicable
Flexible filtering: Easy to fetch specific teams or all teams
Error handling: Comprehensive logging and error messages

Note: NBA API scraper excludes opponent statistics (would require 82+ additional API calls per team). Handle opponent data joins in your ETL pipeline if needed.

Configuration

The package uses params.yaml for configuration. Both scrapers share the same team reference data in constants/team_city_refdata.csv.

Troubleshooting

NBA API: JSONDecodeError

Error: JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Cause: You're running in a cloud environment (AWS, Heroku, GCP, etc.). NBA API blocks datacenter IPs.

Solution:

Run locally for development
Use Basketball Reference scraper for production/cloud deployments

Rate Limiting

NBA API: ~100 requests/minute (built-in 0.4s delay between requests)
Basketball Reference: Respectful delays built-in (~20s per team)

Testing

# Run all tests
poetry run pytest

# Run specific test file
poetry run pytest tests/test_web_scrap_nba_api.py -v
poetry run pytest tests/test_webscrapping_basketball_reference.py -v

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Add tests for new functionality
Submit a pull request

License

See LICENSE file for details.

Contact

For questions or feedback: yannick.flores1992@gmail.com

Changelog

v0.5.4 (Latest)

✨ Added NBA Stats API support via WebScrapNBAApi class
✨ Added nba-api package integration
📝 Comprehensive test coverage for both scrapers
🔧 Removed opponent statistics from NBA API output (data integrity)
⚡ Optimized rate limiting (0.4s between NBA API requests)
📚 Updated documentation with comparison guide

v0.5.3

🐛 Fixed Basketball Reference scraper headers for better reliability
🔧 Improved error handling and logging

Acknowledgments

Basketball Reference for providing comprehensive NBA statistics
NBA.com for the official stats API
nba_api package maintainers for the excellent Python wrapper

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.8.4

Jun 29, 2026

0.8.3

Jun 26, 2026

0.8.2

Mar 4, 2026

0.8.1

Mar 4, 2026

0.8.0

Feb 25, 2026

0.7.4

Dec 8, 2025

0.7.3

Nov 30, 2025

0.7.2

Nov 29, 2025

0.7.1

Nov 28, 2025

0.7.0

Nov 28, 2025

0.6.4

Nov 19, 2025

0.6.3

Nov 14, 2025

0.6.2

Nov 13, 2025

0.6.1

Nov 12, 2025

0.6.0

Nov 12, 2025

0.5.3

Nov 10, 2025

0.5.2

Nov 7, 2025

0.5.1

Nov 7, 2025

0.5.0

Feb 26, 2025

0.4.2

Jul 22, 2024

0.4.1

Jun 2, 2024

0.4.0

Jun 1, 2024

0.3.0

May 29, 2024

0.2.0

May 29, 2024

0.1.7

May 28, 2024

0.1.6

May 28, 2024

0.1.5

May 27, 2024

0.1.3

May 22, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

basketball_reference_webscrapper-0.8.4.tar.gz (26.5 kB view details)

Uploaded Jun 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

basketball_reference_webscrapper-0.8.4-py3-none-any.whl (32.3 kB view details)

Uploaded Jun 29, 2026 Python 3

File details

Details for the file basketball_reference_webscrapper-0.8.4.tar.gz.

File metadata

Download URL: basketball_reference_webscrapper-0.8.4.tar.gz
Upload date: Jun 29, 2026
Size: 26.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for basketball_reference_webscrapper-0.8.4.tar.gz
Algorithm	Hash digest
SHA256	`6fe081f77e88c03ece9605bd9e8c8a18617f86e6228d024f4e2c5d6f44b4ed96`
MD5	`f673b19ce694e46fd7272f1db80d4969`
BLAKE2b-256	`07e088d1b8c573df274574a08226c4e59779b5472a4e58dcd381e9032c1eecbb`

See more details on using hashes here.

File details

Details for the file basketball_reference_webscrapper-0.8.4-py3-none-any.whl.

File metadata

Download URL: basketball_reference_webscrapper-0.8.4-py3-none-any.whl
Upload date: Jun 29, 2026
Size: 32.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for basketball_reference_webscrapper-0.8.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`45cf8a64d85890a88170e5501d566c747795b7864d68e995cca012d26c3d9ae3`
MD5	`25ae811a0c24a7d037f18cd62dcb0c26`
BLAKE2b-256	`a6f175fb547ebb890d28e32371734ec7b67e90dd9e4ced1f8fa9f9d7dcf42b90`

See more details on using hashes here.

basketball-reference-webscrapper 0.8.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

basketball-reference-webscrapper

Features

Installation

Usage

Option 1: Basketball Reference (Web Scraping)

Option 2: NBA Stats API

Comparison: Which Data Source to Use?

Supported Data Types

Basketball Reference

NBA API

Input Validation

Valid Team Abbreviations

Examples

Example 1: Fetch Single Team Gamelog (Basketball Reference)

Example 2: Fetch Multiple Teams Schedule (NBA API)

Example 3: Fetch All Teams (use with caution)

Data Engineering Use

Configuration

Troubleshooting

NBA API: JSONDecodeError

Rate Limiting

Testing

Contributing

License

Contact

Changelog

v0.5.4 (Latest)

v0.5.3

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes