Skip to main content

NPS data acquisition and analysis package

Project description

National Parks: An Agglomerate

This project uses the National Park Service (NPS) API to build a curated dataset of U.S. National Park Service sites. The goal is to combine multiple API endpoints into a single, clean dataset that supports analysis and visualization.


Live App

Explore the interactive Streamlit app here:
https://national-parks-agglomerategit-fx4mzpepe8eaqgridzjnzr.streamlit.app/


Project Goal

This project investigates how park amenities (activities and campgrounds) relate to operational complexity (alerts).

Rather than using a pre-existing dataset, this project builds one from scratch by:

  • Collecting data from the NPS API
  • Cleaning and transforming raw data
  • Engineering useful features
  • Merging multiple data sources into one dataset

Data Sources

This project uses the official National Park Service API:

  • Parks endpoint
  • Alerts endpoint
  • Campgrounds endpoint

To re-run the data collection process, you will need a free API key from: https://www.nps.gov/subjects/developer/get-started.htm

Create a .env file in the root directory:

NPS_API_KEY=your_key_here

Installation

Clone the repository and install the package:

git clone https://github.com/rylion9-lgtm/national-parks-agglomerate
cd national-parks-agglomerate
python -m pip install -e .

To run the Streamlit app, install Streamlit:

python -m pip install streamlit

Running the App

streamlit run app.py

Example Usage (Package)

import pandas as pd
from national_parks import summarize_parks

df = pd.read_csv("data/processed/parks_final.csv")
summary = summarize_parks(df)

print(summary)

Final Dataset

The final dataset is located at:

data/processed/parks_final.csv

It contains:

  • 474 rows
  • 9 columns

Variables

  • fullName: Full name of the park unit
  • parkCode: Unique park identifier
  • states: State abbreviation(s)
  • latitude: Latitude
  • longitude: Longitude
  • description_length: Length of park description
  • num_activities: Count of activities
  • num_alerts: Number of alerts
  • num_campgrounds: Number of campgrounds

Key Insight

Most parks have relatively few alerts regardless of activity level, suggesting only a weak relationship between amenities and alerts. However, parks with more activities tend to show slightly higher alert counts, indicating increased operational complexity.


Project Structure

national-parks-agglomerate/
├── README.md
├── requirements.txt
├── pyproject.toml
├── .gitignore
├── app.py
│
├── data/
│   ├── raw/
│   └── processed/
│
├── src/
│   ├── get_parks.py
│   ├── clean_parks.py
│   ├── get_alerts.py
│   ├── merge_alerts.py
│   ├── get_campgrounds.py
│   └── merge_campgrounds.py
│
└── national_parks/
    ├── __init__.py
    ├── data.py
    ├── clean.py
    └── analyze.py

Notes and Limitations

  • Data represents a snapshot in time (not live-updating)
  • Alerts and campgrounds were limited to 500 records
  • num_activities is an engineered approximation
  • Park units vary widely in size and type

Why This Project Matters

This project demonstrates:

  • API data collection
  • Data cleaning and transformation
  • Feature engineering
  • Multi-source data integration
  • Building an installable Python package
  • Deploying an interactive Streamlit app

It reflects a real-world data science workflow from raw data to deployed application.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

national_parks_agglomerate-0.1.0.tar.gz (3.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

national_parks_agglomerate-0.1.0-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file national_parks_agglomerate-0.1.0.tar.gz.

File metadata

File hashes

Hashes for national_parks_agglomerate-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e979274550a3848efcb59fac0ac46a35ad9c6630d5bd89309222c5d6ea239175
MD5 25af0a3d6fd1592e528afa1f24429d86
BLAKE2b-256 9a4eb7e4e8a3e5b98bf897c1ddb61113e0fc11cc83ed0ea05abe1cf92813ecbe

See more details on using hashes here.

File details

Details for the file national_parks_agglomerate-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for national_parks_agglomerate-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3dd0f67cddb135256f0e1702281bbe27653412f8a16dd8222e5919c49f0c9eec
MD5 90e6f826258ff7a18ca86d1b76b8d53c
BLAKE2b-256 ceeb177e32cad0d6743b2f469a91d30fae3aaaf08dae3ca3b08eba3640384f13

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page