NPS data acquisition and analysis package

Project description

National Parks: An Agglomerate

This project uses the National Park Service (NPS) API to build a curated dataset of U.S. National Park Service sites. The goal is to combine multiple API endpoints into a single, clean dataset that supports analysis and visualization.

Live App

Explore the interactive Streamlit app here:
https://national-parks-agglomerategit-fx4mzpepe8eaqgridzjnzr.streamlit.app/

Project Goal

This project investigates how park amenities (activities and campgrounds) relate to operational complexity (alerts).

Rather than using a pre-existing dataset, this project builds one from scratch by:

Collecting data from the NPS API
Cleaning and transforming raw data
Engineering useful features
Merging multiple data sources into one dataset

Data Sources

This project uses the official National Park Service API:

Parks endpoint
Alerts endpoint
Campgrounds endpoint

To re-run the data collection process, you will need a free API key from: https://www.nps.gov/subjects/developer/get-started.htm

Create a .env file in the root directory:

NPS_API_KEY=your_key_here

Installation

Clone the repository and install the package:

git clone https://github.com/rylion9-lgtm/national-parks-agglomerate
cd national-parks-agglomerate
python -m pip install -e .

To run the Streamlit app, install Streamlit:

python -m pip install streamlit

Running the App

streamlit run app.py

Example Usage (Package)

import pandas as pd
from national_parks import summarize_parks

df = pd.read_csv("data/processed/parks_final.csv")
summary = summarize_parks(df)

print(summary)

Final Dataset

The final dataset is located at:

data/processed/parks_final.csv

It contains:

474 rows
9 columns

Variables

fullName: Full name of the park unit
parkCode: Unique park identifier
states: State abbreviation(s)
latitude: Latitude
longitude: Longitude
description_length: Length of park description
num_activities: Count of activities
num_alerts: Number of alerts
num_campgrounds: Number of campgrounds

Key Insight

Most parks have relatively few alerts regardless of activity level, suggesting only a weak relationship between amenities and alerts. However, parks with more activities tend to show slightly higher alert counts, indicating increased operational complexity.

Project Structure

national-parks-agglomerate/
├── README.md
├── requirements.txt
├── pyproject.toml
├── .gitignore
├── app.py
│
├── data/
│   ├── raw/
│   └── processed/
│
├── src/
│   ├── get_parks.py
│   ├── clean_parks.py
│   ├── get_alerts.py
│   ├── merge_alerts.py
│   ├── get_campgrounds.py
│   └── merge_campgrounds.py
│
└── national_parks/
    ├── __init__.py
    ├── data.py
    ├── clean.py
    └── analyze.py

Notes and Limitations

Data represents a snapshot in time (not live-updating)
Alerts and campgrounds were limited to 500 records
num_activities is an engineered approximation
Park units vary widely in size and type

Why This Project Matters

This project demonstrates:

API data collection
Data cleaning and transformation
Feature engineering
Multi-source data integration
Building an installable Python package
Deploying an interactive Streamlit app

It reflects a real-world data science workflow from raw data to deployed application.

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Apr 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

national_parks_agglomerate-0.1.0.tar.gz (3.6 kB view details)

Uploaded Apr 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

national_parks_agglomerate-0.1.0-py3-none-any.whl (4.4 kB view details)

Uploaded Apr 23, 2026 Python 3

File details

Details for the file national_parks_agglomerate-0.1.0.tar.gz.

File metadata

Download URL: national_parks_agglomerate-0.1.0.tar.gz
Upload date: Apr 23, 2026
Size: 3.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for national_parks_agglomerate-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`e979274550a3848efcb59fac0ac46a35ad9c6630d5bd89309222c5d6ea239175`
MD5	`25af0a3d6fd1592e528afa1f24429d86`
BLAKE2b-256	`9a4eb7e4e8a3e5b98bf897c1ddb61113e0fc11cc83ed0ea05abe1cf92813ecbe`

See more details on using hashes here.

File details

Details for the file national_parks_agglomerate-0.1.0-py3-none-any.whl.

File metadata

Download URL: national_parks_agglomerate-0.1.0-py3-none-any.whl
Upload date: Apr 23, 2026
Size: 4.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for national_parks_agglomerate-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3dd0f67cddb135256f0e1702281bbe27653412f8a16dd8222e5919c49f0c9eec`
MD5	`90e6f826258ff7a18ca86d1b76b8d53c`
BLAKE2b-256	`ceeb177e32cad0d6743b2f469a91d30fae3aaaf08dae3ca3b08eba3640384f13`

See more details on using hashes here.

national-parks-agglomerate 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

National Parks: An Agglomerate

Live App

Project Goal

Data Sources

Installation

Running the App

Example Usage (Package)

Final Dataset

Variables

Key Insight

Project Structure

Notes and Limitations

Why This Project Matters

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes