NPS data acquisition and analysis package
Project description
National Parks: An Agglomerate
This project uses the National Park Service (NPS) API to build a curated dataset of U.S. National Park Service sites. The goal is to combine multiple API endpoints into a single, clean dataset that supports analysis and visualization.
Live App
Explore the interactive Streamlit app here:
https://national-parks-agglomerategit-fx4mzpepe8eaqgridzjnzr.streamlit.app/
Project Goal
This project investigates how park amenities (activities and campgrounds) relate to operational complexity (alerts).
Rather than using a pre-existing dataset, this project builds one from scratch by:
- Collecting data from the NPS API
- Cleaning and transforming raw data
- Engineering useful features
- Merging multiple data sources into one dataset
Data Sources
This project uses the official National Park Service API:
- Parks endpoint
- Alerts endpoint
- Campgrounds endpoint
To re-run the data collection process, you will need a free API key from: https://www.nps.gov/subjects/developer/get-started.htm
Create a .env file in the root directory:
NPS_API_KEY=your_key_here
Installation
Clone the repository and install the package:
git clone https://github.com/rylion9-lgtm/national-parks-agglomerate
cd national-parks-agglomerate
python -m pip install -e .
To run the Streamlit app, install Streamlit:
python -m pip install streamlit
Running the App
streamlit run app.py
Example Usage (Package)
import pandas as pd
from national_parks import summarize_parks
df = pd.read_csv("data/processed/parks_final.csv")
summary = summarize_parks(df)
print(summary)
Final Dataset
The final dataset is located at:
data/processed/parks_final.csv
It contains:
- 474 rows
- 9 columns
Variables
fullName: Full name of the park unitparkCode: Unique park identifierstates: State abbreviation(s)latitude: Latitudelongitude: Longitudedescription_length: Length of park descriptionnum_activities: Count of activitiesnum_alerts: Number of alertsnum_campgrounds: Number of campgrounds
Key Insight
Most parks have relatively few alerts regardless of activity level, suggesting only a weak relationship between amenities and alerts. However, parks with more activities tend to show slightly higher alert counts, indicating increased operational complexity.
Project Structure
national-parks-agglomerate/
├── README.md
├── requirements.txt
├── pyproject.toml
├── .gitignore
├── app.py
│
├── data/
│ ├── raw/
│ └── processed/
│
├── src/
│ ├── get_parks.py
│ ├── clean_parks.py
│ ├── get_alerts.py
│ ├── merge_alerts.py
│ ├── get_campgrounds.py
│ └── merge_campgrounds.py
│
└── national_parks/
├── __init__.py
├── data.py
├── clean.py
└── analyze.py
Notes and Limitations
- Data represents a snapshot in time (not live-updating)
- Alerts and campgrounds were limited to 500 records
num_activitiesis an engineered approximation- Park units vary widely in size and type
Why This Project Matters
This project demonstrates:
- API data collection
- Data cleaning and transformation
- Feature engineering
- Multi-source data integration
- Building an installable Python package
- Deploying an interactive Streamlit app
It reflects a real-world data science workflow from raw data to deployed application.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file national_parks_agglomerate-0.1.0.tar.gz.
File metadata
- Download URL: national_parks_agglomerate-0.1.0.tar.gz
- Upload date:
- Size: 3.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e979274550a3848efcb59fac0ac46a35ad9c6630d5bd89309222c5d6ea239175
|
|
| MD5 |
25af0a3d6fd1592e528afa1f24429d86
|
|
| BLAKE2b-256 |
9a4eb7e4e8a3e5b98bf897c1ddb61113e0fc11cc83ed0ea05abe1cf92813ecbe
|
File details
Details for the file national_parks_agglomerate-0.1.0-py3-none-any.whl.
File metadata
- Download URL: national_parks_agglomerate-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3dd0f67cddb135256f0e1702281bbe27653412f8a16dd8222e5919c49f0c9eec
|
|
| MD5 |
90e6f826258ff7a18ca86d1b76b8d53c
|
|
| BLAKE2b-256 |
ceeb177e32cad0d6743b2f469a91d30fae3aaaf08dae3ca3b08eba3640384f13
|