High-performance Python library for Forest Inventory Analysis (FIA) data analysis
Project description
pyFIA
A high-performance Python library for analyzing USDA Forest Inventory and Analysis (FIA) data using modern data science tools.
Overview
pyFIA provides a programmatic API for working with Forest Inventory and Analysis (FIA) data. It leverages modern Python data science tools like Polars and DuckDB for efficient processing of large-scale national forest inventory datasets with statistically valid estimation methods.
Features
Core Estimation Functions
- ✅ Trees per acre (
tpa()) - Live and dead tree abundance - ✅ Biomass (
biomass()) - Above/belowground biomass and carbon - ✅ Volume (
volume()) - Merchantable volume (cubic feet) - ✅ Forest area (
area()) - Forest land area by category - ✅ Area change (
area_change()) - Annual forest land transitions - ✅ Mortality (
mortality()) - Annual mortality rates - ✅ Growth (
growth()) - Net growth estimation
Statistical Methods
- Design-based estimation following Bechtold & Patterson (2005)
- Post-stratified estimation with proper variance calculation
- Temporally indifferent (TI) estimation matching EVALIDator default
- EVALID-based filtering for statistically valid estimates
- Ratio-of-means estimators for per-acre values
Performance Features
- DuckDB backend for efficient large-scale data processing
- Polars DataFrames for fast in-memory operations
- Lazy evaluation for memory-efficient workflows
- Parallel processing support
Spatial Capabilities
- Polygon clipping - Filter plots by custom boundaries
- Polygon attribute joins - Group estimates by polygon attributes (e.g., by county)
- Multiple formats - Shapefile, GeoJSON, GeoPackage, GeoParquet
- DuckDB spatial - Powered by DuckDB's spatial extension
Installation
# Basic installation
pip install pyfia
# With spatial analysis support
pip install pyfia[spatial]
# For development
pip install -e .[dev]
Quick Start
from pyfia import FIA, biomass, tpa, volume, area, area_change
# Load FIA data and filter to a state
with FIA("path/to/FIA_database.duckdb") as db:
# Filter to state (required before estimation)
db.clip_by_state(37) # North Carolina
db.clip_most_recent(eval_type="EXPVOL")
# Get trees per acre (live trees on forestland)
tpa_results = tpa(db, tree_domain="STATUSCD == 1")
# Get biomass estimates
biomass_results = biomass(db, land_type="forest")
# Get forest area
area_results = area(db, land_type="forest")
# Get volume estimates
volume_results = volume(db, land_type="forest")
# Get annual forest area change (net gain/loss)
change_results = area_change(db, land_type="forest")
Domain Filtering and Grouping
pyFIA supports flexible domain filtering and grouping:
# Tree-level filtering (snake_case parameters)
tpa_live = tpa(db, tree_domain="STATUSCD == 1")
# Group by species
biomass_by_species = biomass(db, by_species=True)
# Area domain filtering
area_timberland = area(db, land_type="timber")
# Group by custom column
volume_by_owner = volume(db, grp_by="OWNGRPCD")
# Area change: net, gross gain, or gross loss
net_change = area_change(db, change_type="net") # Gains - Losses
forest_gain = area_change(db, change_type="gross_gain") # Non-forest → Forest
forest_loss = area_change(db, change_type="gross_loss") # Forest → Non-forest
Spatial Filtering
Filter plots by polygon boundaries using DuckDB's spatial extension:
from pyfia import FIA, tpa
with FIA("southeast.duckdb") as db:
db.clip_by_state(37) # North Carolina
# Filter to custom region using any spatial file format
db.clip_by_polygon("my_region.geojson") # GeoJSON
# db.clip_by_polygon("counties.shp") # Shapefile
# db.clip_by_polygon("boundary.gpkg") # GeoPackage
# Run estimation on filtered plots
result = tpa(db, tree_domain="STATUSCD == 1")
Grouping by Polygon Attributes
Join polygon attributes to plots and use them for grouping:
with FIA("southeast.duckdb") as db:
db.clip_by_state(37)
# Join county attributes to plots
db.intersect_polygons("counties.shp", attributes=["NAME", "FIPS"])
# Group estimates by county
result = tpa(db, grp_by=["NAME"])
Supported formats: Shapefile, GeoJSON, GeoPackage, GeoParquet, and any GDAL-supported format.
Note: FIA public coordinates are fuzzed up to 1 mile for privacy protection, so spatial precision below ~1 mile is not meaningful.
Data Organization
pyFIA follows FIA's evaluation-based data structure:
- EVALID: 6-digit codes identifying statistically valid plot groupings
- Evaluation types: EXPALL (area), EXPVOL (volume), EXPMORT (mortality), EXPGROW (growth)
- EVALID management: Use
db.clip_most_recent(eval_type="EXPVOL")for latest evaluations
Advanced Usage
# Context manager for automatic connection handling
with FIA("path/to/FIA_database.duckdb") as db:
# Filter to state and most recent evaluation
db.clip_by_state(37) # North Carolina
db.clip_most_recent(eval_type="EXPVOL")
# Biomass by species
results = biomass(db, by_species=True)
# Multiple estimations with same connection
tpa_results = tpa(db, tree_domain="STATUSCD == 1")
volume_results = volume(db, tree_domain="DIA >= 10.0")
area_results = area(db, land_type="timber")
Documentation
Full documentation available at https://mihiarc.github.io/pyfia/
Performance
pyFIA achieves excellent performance through modern database technologies:
- 10-100x faster for large-scale queries using DuckDB columnar storage
- 2-5x faster for in-memory operations using Polars DataFrames
- Statistically valid estimates following FIA methodology
Citation
If you use pyFIA in your research, please cite:
@software{pyfia2024,
title = {pyFIA: A Python Library for Forest Inventory Analysis},
author = {Mihiar, Chris},
year = {2024},
url = {https://github.com/mihiarc/pyfia}
}
License
MIT License - see LICENSE file for details.
Acknowledgments
- Uses USDA Forest Service FIA data
- Statistical methods from Bechtold & Patterson (2005):
- Bechtold, W.A.; Patterson, P.L., eds. 2005. The Enhanced Forest Inventory and Analysis Program - National Sampling Design and Estimation Procedures. Gen. Tech. Rep. SRS-80. Asheville, NC: U.S. Department of Agriculture, Forest Service, Southern Research Station. 85 p. https://doi.org/10.2737/SRS-GTR-80
- Key equations: Chapter 4 (pp. 53-77) - see Eq. 4.1 (domain indicator), Eq. 4.2 (adjustment factor), Eq. 4.8 (tree attributes), Section 4.2 (variance estimation)
- Inspired by various FIA analysis tools and methodologies in the forestry community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyfia-1.1.0b1.tar.gz.
File metadata
- Download URL: pyfia-1.1.0b1.tar.gz
- Upload date:
- Size: 152.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7dbbc3d25304f967d12e34d00e13be7365bbd9daf4065825b3b6e0cae532e15
|
|
| MD5 |
a29ab897b7604dcb595a553d6f9e5a8f
|
|
| BLAKE2b-256 |
9379952eda42b8fb55f4708f90465520c255daf7e3448803c2dcc9948a458374
|
File details
Details for the file pyfia-1.1.0b1-py3-none-any.whl.
File metadata
- Download URL: pyfia-1.1.0b1-py3-none-any.whl
- Upload date:
- Size: 188.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ad5e28e0d74bcfd2bf64a026bbbc2978eae661d9c9a8eae4d67a87ccbf94155e
|
|
| MD5 |
e56b04076c49528147b06641a2ff8b09
|
|
| BLAKE2b-256 |
2be5fece1fb68fe7d4f9c873e53daeec0e16fb510810cb06d673dd54e644777b
|