Skip to main content

High-performance Python library for Forest Inventory Analysis (FIA) data analysis

Project description

pyFIA

Documentation Deploy Documentation

A high-performance Python library for analyzing USDA Forest Inventory and Analysis (FIA) data using modern data science tools.

Overview

pyFIA provides a programmatic API for working with Forest Inventory and Analysis (FIA) data. It leverages modern Python data science tools like Polars and DuckDB for efficient processing of large-scale national forest inventory datasets with statistically valid estimation methods.

Features

Core Estimation Functions

  • Trees per acre (tpa()) - Live and dead tree abundance
  • Biomass (biomass()) - Above/belowground biomass and carbon
  • Volume (volume()) - Merchantable volume (cubic feet)
  • Forest area (area()) - Forest land area by category
  • Area change (area_change()) - Annual forest land transitions
  • Mortality (mortality()) - Annual mortality rates
  • Growth (growth()) - Net growth estimation

Statistical Methods

  • Design-based estimation following Bechtold & Patterson (2005)
  • Post-stratified estimation with proper variance calculation
  • Temporally indifferent (TI) estimation matching EVALIDator default
  • EVALID-based filtering for statistically valid estimates
  • Ratio-of-means estimators for per-acre values

Performance Features

  • DuckDB backend for efficient large-scale data processing
  • Polars DataFrames for fast in-memory operations
  • Lazy evaluation for memory-efficient workflows
  • Parallel processing support

Spatial Capabilities

  • Polygon clipping - Filter plots by custom boundaries
  • Polygon attribute joins - Group estimates by polygon attributes (e.g., by county)
  • Multiple formats - Shapefile, GeoJSON, GeoPackage, GeoParquet
  • DuckDB spatial - Powered by DuckDB's spatial extension

Installation

# Basic installation
pip install pyfia

# With spatial analysis support  
pip install pyfia[spatial]

# For development
pip install -e .[dev]

Quick Start

from pyfia import FIA, biomass, tpa, volume, area, area_change

# Load FIA data and filter to a state
with FIA("path/to/FIA_database.duckdb") as db:
    # Filter to state (required before estimation)
    db.clip_by_state(37)  # North Carolina
    db.clip_most_recent(eval_type="EXPVOL")

    # Get trees per acre (live trees on forestland)
    tpa_results = tpa(db, tree_domain="STATUSCD == 1")

    # Get biomass estimates
    biomass_results = biomass(db, land_type="forest")

    # Get forest area
    area_results = area(db, land_type="forest")

    # Get volume estimates
    volume_results = volume(db, land_type="forest")

    # Get annual forest area change (net gain/loss)
    change_results = area_change(db, land_type="forest")

Domain Filtering and Grouping

pyFIA supports flexible domain filtering and grouping:

# Tree-level filtering (snake_case parameters)
tpa_live = tpa(db, tree_domain="STATUSCD == 1")

# Group by species
biomass_by_species = biomass(db, by_species=True)

# Area domain filtering
area_timberland = area(db, land_type="timber")

# Group by custom column
volume_by_owner = volume(db, grp_by="OWNGRPCD")

# Area change: net, gross gain, or gross loss
net_change = area_change(db, change_type="net")        # Gains - Losses
forest_gain = area_change(db, change_type="gross_gain")  # Non-forest → Forest
forest_loss = area_change(db, change_type="gross_loss")  # Forest → Non-forest

Spatial Filtering

Filter plots by polygon boundaries using DuckDB's spatial extension:

from pyfia import FIA, tpa

with FIA("southeast.duckdb") as db:
    db.clip_by_state(37)  # North Carolina

    # Filter to custom region using any spatial file format
    db.clip_by_polygon("my_region.geojson")  # GeoJSON
    # db.clip_by_polygon("counties.shp")     # Shapefile
    # db.clip_by_polygon("boundary.gpkg")    # GeoPackage

    # Run estimation on filtered plots
    result = tpa(db, tree_domain="STATUSCD == 1")

Grouping by Polygon Attributes

Join polygon attributes to plots and use them for grouping:

with FIA("southeast.duckdb") as db:
    db.clip_by_state(37)

    # Join county attributes to plots
    db.intersect_polygons("counties.shp", attributes=["NAME", "FIPS"])

    # Group estimates by county
    result = tpa(db, grp_by=["NAME"])

Supported formats: Shapefile, GeoJSON, GeoPackage, GeoParquet, and any GDAL-supported format.

Note: FIA public coordinates are fuzzed up to 1 mile for privacy protection, so spatial precision below ~1 mile is not meaningful.

Data Organization

pyFIA follows FIA's evaluation-based data structure:

  • EVALID: 6-digit codes identifying statistically valid plot groupings
  • Evaluation types: EXPALL (area), EXPVOL (volume), EXPMORT (mortality), EXPGROW (growth)
  • EVALID management: Use db.clip_most_recent(eval_type="EXPVOL") for latest evaluations

Advanced Usage

# Context manager for automatic connection handling
with FIA("path/to/FIA_database.duckdb") as db:
    # Filter to state and most recent evaluation
    db.clip_by_state(37)  # North Carolina
    db.clip_most_recent(eval_type="EXPVOL")

    # Biomass by species
    results = biomass(db, by_species=True)

    # Multiple estimations with same connection
    tpa_results = tpa(db, tree_domain="STATUSCD == 1")
    volume_results = volume(db, tree_domain="DIA >= 10.0")
    area_results = area(db, land_type="timber")

Documentation

Full documentation available at https://mihiarc.github.io/pyfia/

Performance

pyFIA achieves excellent performance through modern database technologies:

  • 10-100x faster for large-scale queries using DuckDB columnar storage
  • 2-5x faster for in-memory operations using Polars DataFrames
  • Statistically valid estimates following FIA methodology

Citation

If you use pyFIA in your research, please cite:

@software{pyfia2024,
  title = {pyFIA: A Python Library for Forest Inventory Analysis},
  author = {Mihiar, Chris},
  year = {2024},
  url = {https://github.com/mihiarc/pyfia}
}

License

MIT License - see LICENSE file for details.

Acknowledgments

  • Uses USDA Forest Service FIA data
  • Statistical methods from Bechtold & Patterson (2005):
    • Bechtold, W.A.; Patterson, P.L., eds. 2005. The Enhanced Forest Inventory and Analysis Program - National Sampling Design and Estimation Procedures. Gen. Tech. Rep. SRS-80. Asheville, NC: U.S. Department of Agriculture, Forest Service, Southern Research Station. 85 p. https://doi.org/10.2737/SRS-GTR-80
    • Key equations: Chapter 4 (pp. 53-77) - see Eq. 4.1 (domain indicator), Eq. 4.2 (adjustment factor), Eq. 4.8 (tree attributes), Section 4.2 (variance estimation)
  • Inspired by various FIA analysis tools and methodologies in the forestry community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyfia-1.1.0b1.tar.gz (152.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyfia-1.1.0b1-py3-none-any.whl (188.7 kB view details)

Uploaded Python 3

File details

Details for the file pyfia-1.1.0b1.tar.gz.

File metadata

  • Download URL: pyfia-1.1.0b1.tar.gz
  • Upload date:
  • Size: 152.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for pyfia-1.1.0b1.tar.gz
Algorithm Hash digest
SHA256 e7dbbc3d25304f967d12e34d00e13be7365bbd9daf4065825b3b6e0cae532e15
MD5 a29ab897b7604dcb595a553d6f9e5a8f
BLAKE2b-256 9379952eda42b8fb55f4708f90465520c255daf7e3448803c2dcc9948a458374

See more details on using hashes here.

File details

Details for the file pyfia-1.1.0b1-py3-none-any.whl.

File metadata

  • Download URL: pyfia-1.1.0b1-py3-none-any.whl
  • Upload date:
  • Size: 188.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for pyfia-1.1.0b1-py3-none-any.whl
Algorithm Hash digest
SHA256 ad5e28e0d74bcfd2bf64a026bbbc2978eae661d9c9a8eae4d67a87ccbf94155e
MD5 e56b04076c49528147b06641a2ff8b09
BLAKE2b-256 2be5fece1fb68fe7d4f9c873e53daeec0e16fb510810cb06d673dd54e644777b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page