Skip to main content

YouTube trending video analysis package — merges Kaggle data with YouTube API for EDA and predictive modeling

Project description

# YouTube Performance Analysis
**STAT 386 Final Project — Summer Price & Jane Gustafson**

This package analyzes YouTube trending video data by merging the Kaggle YouTube Trending Videos dataset with live data from the YouTube Data API. The result is a custom longitudinal dataset that tracks how trending videos from 2017 have grown over time, enabling exploratory analysis and predictive modeling of video performance.

## What This Package Does

- Downloads the Kaggle YouTube Trending Videos dataset (US, Nov 2017 - Jun 2018)
- Fetches current view, like, and comment counts for each video via the YouTube Data API
- Merges both sources into a single cleaned dataset
- Runs exploratory data analysis across 5 dimensions (growth, trending patterns, categories, engagement, time to trend)
- Trains 3 Random Forest models to predict current views, time to trend, and view growth

## Quick Start

```bash
git clone https://github.com/summeraskey/final_project386.git
cd final_project386
uv venv
source .venv/bin/activate
uv sync

Create a .env file in the project root:

YOUTUBE_API_KEY=your_youtube_api_key
KAGGLE_USERNAME=your_kaggle_username
KAGGLE_KEY=your_kaggle_api_key

Usage

from final_project_demo import run_cleaning_pipeline, run_analysis_pipeline

df = run_cleaning_pipeline()
run_analysis_pipeline(df)

Streamlit App

An interactive model predictor is hosted at: https://finalproject386-qpuktjzfa562fbmkaycd9v.streamlit.app/

To run locally:

streamlit run src/final_project_demo/streamlit_app.py

GitHub Pages Site

Full documentation, tutorial, and technical report are hosted at: https://summeraskey.github.io/final_project386/

Project Structure

final_project386/
├── src/final_project_demo/
│   ├── cleaning.py        # Data loading and cleaning pipeline
│   ├── analysis.py        # EDA and predictive modeling
│   └── streamlit_app.py   # Interactive Streamlit app
├── docs/                  # Generated Quarto site
├── index.qmd              # Home page
├── Documentation.qmd      # Function reference
├── Tutorial.qmd           # Usage tutorial
├── TechnicalReport.qmd    # Full technical report
└── _quarto.yml            # Quarto configuration

Rebuild the Site

quarto render

Serve locally with:

quarto preview

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

final_project_demo-0.1.0.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

final_project_demo-0.1.0-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file final_project_demo-0.1.0.tar.gz.

File metadata

  • Download URL: final_project_demo-0.1.0.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for final_project_demo-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6ae3badc1adc92c8db0d22b97614249762e890184523ecc9fdf528d5b8aef06e
MD5 69fae38360d8e8be0d92739b47c2fcf3
BLAKE2b-256 93a0b7767075c16564ce3697837c6bbc5c53ec07235c41ba1413aeca7991664d

See more details on using hashes here.

File details

Details for the file final_project_demo-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for final_project_demo-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d7b8ab27c33ce1e583527b4bc0ae50363e26f75a09a2659b2dc0f73277be4bd4
MD5 a055fb07d8380f9d72758a61430bcd05
BLAKE2b-256 c31b23b3860b8b4954487d32d9a18ef1d4b52b8846af0d25463693f3ca2b8f90

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page