A package for collecting and analyzing NBA player data.

These details have not been verified by PyPI

Project links

Homepage

Project description

nba_analytics

Compiling a report is the primary goal of this project. It is located in the REPORT.md file. This is a project for organizing data collection, data processing, and machine learning tasks related to NBA player statistics, specifically to determine valuable players among the DETROIT PISTONS. Another portion of this project is performing the same, local functions through cloud architecture (Azure SQL, Azure Blob Storage, Databricks, Azure Synapse, and PowerBI).

Architecture Comparison of Cloud vs. Local Data Engineering/Analytics:

Usage

To use this project, clone the repository and set up the necessary dependencies. Create an environment (Ctrl+Shift+P on VSCODE) using the requirements.txt. You can then run the scripts in the main_ipynb.ipynb for easy use or directly in the src directory for data collection, processing, and machine learning tasks.

Directory Structure

The project directory is organized as follows:

data/: Contains datasets used in the project.
- datasets/
  - nba_players.csv: Dataset containing information about NBA players.
  - nba_player_stats_5years_overlap.csv: Dataset containing every 5 consecutive years of NBA player statistics (from nba_player_stats_5years.csv).
  - nba_player_stats_5years_tensor_ready.csv: PyTorch import version of nba_player_stats_5years.csv.
  - nba_player_stats_5years.csv: Dataset (csv) containing first 5 years of NBA player statistics.
  - nba_player_stats_5years.json: Json version of nba_player_stats_5years.csv.
  - nba_players_advanced.csv: Dataset containing advanced NBA player statistics.
  - nba_players_basic.csv: Dataset containing basic NBA player statistics.
  - nba_player_stats.csv: Dataset containing combined NBA player statistics.
- graphs: Contains data analytic graphs from analytics/.
- models: Contains machine learning models from machine_learning/.
- reports: Location for PowerBI and local pdf created reports from src/utils/reporting.py.
logs/: Contains log files generated during the project.
- nba_player_stats.log: Log file for NBA player statistics data processing.
- src/: Contains the source code for data collection, data processing, and machine learning tasks.
  - dataset/: Contains scripts for processing and cleaning data.
    - creation.py: Module for creating datasets from NBA API using basketball_reference_web_scraper.
    - processing.py: Module for processing datasets to create a useful dataset.
    - torch.py: Module for processing datasets for PyTorch/machine learning evaluation.
    - filtering.py: Module for processing datasets further (possibly to be used by dataset_processing.py).
  - machine_learning/: Contains scripts for machine learning tasks.
    - models/: Contains models to be used for the machine learning tasks.
      - arima.py: (TODO for better step evaluation)
      - lstm.py: LSTM neural networks (custom and PyTorch built-in) for Many-to-Many prediction.
      - neuralnet.py: Basic neural net for 1-to-1 prediction
    - train_models.py: Module for directly training models in models/.
    - use_models.py: Module for directly using models in models/.
- utils/: Contains utility scripts used across the project.
  - logger.py: Utility script for logging messages.
  - config.py: Utility for settings among files.
generate_requirements.bat: Batch file to generate the requirements.txt file.
requirements.txt: File containing project dependencies.
reference: Any other files related to the project used for referencing.

Work Schedule

Week of 7/29

Day	Task	Status
Monday	Completed data format refactoring (seperation of base, filtered, continuous, and continuous_first data formats).	✔
Tuesday	Re-worked Databricks code to fix new package setup, refitted SQL, and completed Azure Synapse integration. Restart reports.	✔
Wednesday	Complete data changes to include team and other excluded columns. Continue with creating report about Piston's players.	✔
Thursday	--	✖
Friday	--	✖
Saturday	N/A: No Progress on Saturdays.	---
Sunday	--	✖

TODOs

Task	Description
Set up linked services and define ETL pipelines.	Critical for data transformation.
Create Azure Machine Learning Workspace.	Foundation for machine learning projects.
Set up machine learning environment and upload datasets.	Necessary for model training.
Train models using Azure Machine Learning.	Key for predictive analytics.
Deploy models as web services.	For model accessibility.
Integrate with Azure Blob Storage for data storage.	For data persistence.
Update scripts to use Azure Blob Storage SDK.	To leverage Azure storage capabilities.
Automate workflow using Azure Logic Apps or Azure DevOps.	For streamlined operations.
Finish setting up the Data Factory and integrate with Databricks.	For enhanced data processing and analytics, added for Sunday.
Before Azure Machine Learning Tasks.	Refactor/modify dataset `processing` to use numpy savez for saving with dictionary or label row.

Week of 7/22

Day	Task	Status
Monday	Completed pipeline of bronze_to_silver_to_gold.	✔
Tuesday	Rethinking pipeline.	✔
Wednesday	~~Start Synapse integration to create gold_db.~~ TODO: Needs refactoring of tables input to SQL database.	✔
Thursday	Start refactoring of input data Start refactoring output gold data to integrate dictionaries for 3d data. Start PowerBI integration (using gold output instead of gold_db for now).	✔
Friday	Finished refactoring of data.	✔
Saturday	N/A: No Progress on Saturdays.	---
Sunday	Integrate new SQL into pipeline.	✔

Week of 7/15

Day	Task	Status
Monday	Continued package integration and SQL setup.	✔
Tuesday	Complete package integration.	✔
Wednesday	Complete SQL -> Data Factory pipeline.	✔
Thursday	Begin Databricks/Spark integration with `bronze_to_silver` `silver_to_gold`.	✔
Friday	Continue working on Databricks/Spark data engineering.	✔
Saturday	N/A: No Progress on Saturdays.	---
Sunday	Create a working/or prototype of the bronze_to_silver_to_gold pipeline.	✔

Week of 7/8

Day	Task	Status
Monday	Set up settings.cloud.	✔
Tuesday	Research ways to implement ETL.	✔
Wednesday	Reconfigure the project into a module for Azure Functions functionality.	✔
Thursday	Modify nba_pistons into 'nba_analytics' package structure.	✔
Friday	Continue package modifications.	✔
Saturday	N/A: No Progress on Saturdays.	---
Sunday	Set up Azure, SQL, and data factory.	✔

Week of 7/1

Task	Result	Status
Explore Power BI, Azure, and Fabric	Decided on adapting project into Azure workflow with analytics into Fabric	✔

Week of 6/24

Day	Task	Status
Monday	Complete `lstm`. Look into `REPORT.md` automation.	✔
Tuesday	Complete automation of `reports`.	✔
Wednesday	Look into Databricks implementation. Begin PowerBI testing.	✔
Thursday	Modify `use_models.py` use_model() for model prediction output.	✔
Friday	Complete prediction graphs and create average prediction bar graph in `analytics`. Look into PowerBI use cases over weekend and plan report.	✔
Saturday	N/A: No Progress on Saturdays.	---
Sunday	Begin including Azure/Fabric/PowerBI for data organization, engineering, and reports.	✔

Week of 6/17

Day	Task	Status
Monday	Look into ARIMA and complete LSTM.	✔
Tuesday	Perform analytics for tasks and update `REPORT.md`.	✔
Wednesday	Complete dataset expansion for any 5-year length players.	✔
Thursday	Complete `torch_overlap` to merge custom dataset.	✔
Friday	Create many(4)-to-one and one-to-one neural networks.	✔
Saturday	No Progress on Saturdays. Meanwhile: Re-think dataset names for dataset.	---
Sunday	Re-check and complete neural networks and start ARIMA preparation in `use_models`. Perform analytics for tasks and update `REPORT.md`.	✔

Contributors

Alexander Hernandez

Feel free to contribute to this project by submitting pull requests or opening issues.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.2.19

Aug 1, 2024

0.2.18

Jul 31, 2024

0.2.17

Jul 31, 2024

0.2.16

Jul 25, 2024

0.2.15

Jul 25, 2024

0.2.14

Jul 24, 2024

0.2.13

Jul 23, 2024

0.2.12

Jul 22, 2024

0.2.11

Jul 22, 2024

0.2.10

Jul 22, 2024

0.2.9

Jul 22, 2024

0.2.8

Jul 22, 2024

0.2.7

Jul 22, 2024

0.2.6

Jul 22, 2024

0.2.5

Jul 22, 2024

0.2.4

Jul 19, 2024

0.2.3

Jul 11, 2024

0.2.2

Jul 11, 2024

0.2.1

Jul 11, 2024

0.2.0

Jul 11, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nba_analytics-0.2.19.tar.gz (44.7 kB view hashes)

Uploaded Aug 1, 2024 Source

Built Distribution

nba_analytics-0.2.19-py3-none-any.whl (55.6 kB view hashes)

Uploaded Aug 1, 2024 Python 3

Hashes for nba_analytics-0.2.19.tar.gz

Hashes for nba_analytics-0.2.19.tar.gz
Algorithm	Hash digest
SHA256	`d6b220f653a8c6313022d750be27085a994987c64a3151dba43bb081f578aba5`
MD5	`b58197e45776d6e1e9b8a3a07a014c8a`
BLAKE2b-256	`486a3a029aa5c410691665580b78564aabe1d87868d0a42e636af924ddfa8b7a`

Hashes for nba_analytics-0.2.19-py3-none-any.whl

Hashes for nba_analytics-0.2.19-py3-none-any.whl
Algorithm	Hash digest
SHA256	`948dd80ae04a51addbb2bb16933d55db5d2a07243d9fe79271d0361c46ca076f`
MD5	`757903b4b3c48c4ea0510253f927fe23`
BLAKE2b-256	`e62a685d1d79fb4418f049a8ad8189cefe25e1a74567abd4f2f355b13ffc6db5`