Skip to main content

A comprehensive Python package for vegetation data analysis and environmental modeling

Project description

VegZ: Comprehensive Vegetation Data Analysis Package

PyPI version Python versions License: MIT

VegZ is a comprehensive, professional-grade Python package designed specifically for vegetation data analysis and environmental modeling. It provides a complete suite of tools for ecologists, environmental scientists, and researchers working with biodiversity and vegetation data.

Complete Feature List

Data Management & Preprocessing

  • Parse vegetation survey data from multiple formats (CSV, Excel, Turboveg)
  • Integration with remote sensing APIs (Landsat, MODIS, Sentinel)
  • Darwin Core biodiversity standards compliance
  • Species name standardization with fuzzy matching
  • Coordinate system transformations
  • Multiple data transformation methods (Hellinger, chord, Wisconsin, log, sqrt, standardize)
  • Automatic species matrix detection
  • Support for heterogeneous data integration

Data Quality & Validation

  • Comprehensive spatial coordinate validation
  • Temporal data validation and date parsing
  • Geographic outlier detection with country boundary checks
  • Coordinate precision assessment
  • Duplicate record identification
  • Invalid coordinate range detection
  • Transposed coordinate detection
  • Country boundary consistency checks
  • Automated quality reporting

Diversity Analysis (15+ Indices)

  • Basic indices: Shannon, Simpson, Simpson inverse, richness, evenness
  • Advanced indices: Fisher's alpha, Berger-Parker, McIntosh, Brillouin
  • Additional indices: Menhinick, Margalef
  • Richness estimators: Chao1, ACE, Jackknife1, Jackknife2
  • Hill numbers for multiple diversity orders (q = 0, 0.5, 1, 1.5, 2, etc.)
  • Beta diversity analysis (Whittaker, Sørensen, Jaccard methods)
  • Rarefaction curves and extrapolation
  • Species accumulation curves
  • Diversity profiles

Complete Multivariate Analysis Suite

  • PCA - Principal Component Analysis with multiple transformations
  • CA - Correspondence Analysis with scaling options
  • DCA - Detrended Correspondence Analysis with segment control
  • CCA - Canonical Correspondence Analysis with constraints
  • RDA - Redundancy Analysis for linear relationships
  • NMDS - Non-metric Multidimensional Scaling with stress assessment
  • PCoA - Principal Coordinates Analysis (metric MDS)
  • Environmental vector fitting to ordination axes
  • Procrustes analysis for ordination comparison
  • Goodness-of-fit diagnostics
  • Multiple ecological distance matrices (Bray-Curtis, Jaccard, Sørensen, Euclidean, Manhattan, Canberra, Chord, Hellinger)

Advanced Clustering Methods

  • TWINSPAN - Two-Way Indicator Species Analysis (vegetation classification gold standard)
    • Pseudospecies creation with customizable cut levels
    • Hierarchical divisive classification
    • Indicator species identification
    • Classification tree structure
  • Hierarchical clustering with ecological distance matrices
  • Comprehensive Elbow Analysis with 5 detection algorithms:
    • Kneedle algorithm (Satopaa et al., 2011) - automatic knee detection
    • Second derivative maximum - curvature-based detection
    • Variance explained threshold - <10% additional variance criterion
    • Distortion jump method (Sugar & James, 2003) - jump detection
    • L-method (Salvador & Chan, 2004) - piecewise linear fitting
  • Consensus recommendations with confidence scores
  • K-means clustering with multiple initializations
  • Fuzzy C-means clustering for gradient boundaries
  • DBSCAN for density-based core community detection
  • Gaussian Mixture Models for probabilistic clustering
  • Clustering validation metrics (silhouette, gap statistic, Calinski-Harabasz, Davies-Bouldin)
  • Optimal k determination with multiple methods

Statistical Analysis

  • PERMANOVA - Permutational multivariate analysis of variance
  • ANOSIM - Analysis of similarities
  • MRPP - Multi-response permutation procedures
  • Mantel tests and partial Mantel tests for matrix correlation
  • Indicator Species Analysis (IndVal) for cluster characterization
  • SIMPER - Similarity percentages for group comparisons
  • Cophenetic correlation for hierarchical clustering validation

Environmental Modeling

  • Generalized Additive Models (GAMs) with multiple smoothers:
    • Spline smoothers
    • LOWESS smoothers
    • Polynomial smoothers
    • Gaussian process smoothers
  • Species response curves modeling:
    • Gaussian response curves
    • Skewed Gaussian curves
    • Beta response curves
    • Linear responses
    • Threshold responses
    • Unimodal responses
  • Environmental gradient analysis
  • Environmental niche modeling

Temporal Analysis

  • Phenology modeling with multiple curve types
  • Trend detection using Mann-Kendall tests
  • Time series decomposition (seasonal, trend, residual)
  • Seasonal pattern analysis
  • Temporal autocorrelation analysis
  • Change point detection

Spatial Analysis

  • Spatial interpolation methods:
    • Inverse Distance Weighting (IDW)
    • Kriging (ordinary, universal)
    • Spline interpolation
  • Landscape metrics calculation:
    • Patch density
    • Edge density
    • Contagion index
    • Shannon diversity index for landscapes
  • Spatial autocorrelation analysis (Moran's I, Geary's C)
  • Point pattern analysis
  • Spatial clustering detection

Specialized Methods

  • Phylogenetic diversity analysis:
    • Faith's phylogenetic diversity
    • Phylogenetic endemism
    • Net Relatedness Index (NRI)
    • Nearest Taxon Index (NTI)
  • Metacommunity analysis:
    • Elements of metacommunity structure
    • Coherence, turnover, and boundary clumping
  • Network analysis:
    • Co-occurrence networks
    • Modularity analysis
    • Network centrality measures
  • Nestedness analysis with null models:
    • NODF (Nestedness based on Overlap and Decreasing Fill)
    • Temperature calculator
    • Null model generation and testing

Functional Trait Analysis

  • Trait syndrome identification
  • Community-weighted means (CWM)
  • Functional diversity indices:
    • Functional richness (FRic)
    • Functional evenness (FEve)
    • Functional divergence (FDiv)
    • Rao's quadratic entropy
  • Trait-environment relationships
  • Fourth-corner analysis

Machine Learning & Predictive Modeling

  • Species Distribution Modeling (SDM):
    • MaxEnt-style modeling
    • Random Forest models
    • Gradient Boosting models
  • Classification algorithms for vegetation types
  • Regression models for abundance prediction
  • Model validation and performance metrics
  • Variable importance assessment
  • Ensemble modeling

Visualization & Reporting

  • Specialized ecological plots:
    • Diversity bar charts and histograms
    • Species accumulation curves
    • Rarefaction plots
  • Ordination diagrams with:
    • Site scores plotting
    • Species loading arrows
    • Environmental vector overlays
    • Convex hulls for groups
    • Stress plots for NMDS
  • Clustering visualizations:
    • Dendrograms with customizable formatting
    • Silhouette plots
    • Comprehensive elbow analysis plots (4-panel layout)
    • Cluster validation plots
  • Interactive dashboards using Plotly/Bokeh
  • Automated quality reports with statistical summaries
  • Export functions (HTML, PDF, PNG, SVG, CSV)

Quick Analysis Functions

  • quick_diversity_analysis() - Instant diversity calculations
  • quick_ordination() - Rapid PCA or NMDS analysis
  • quick_clustering() - Fast k-means or hierarchical clustering
  • quick_elbow_analysis() - Optimal cluster number determination

Quick Start

Installation

pip install vegz

For extended functionality:

# With spatial analysis support
pip install vegz[spatial]

# With remote sensing capabilities  
pip install vegz[remote-sensing]

# Complete installation with all features
pip install vegz[spatial,remote-sensing,fuzzy,interactive]

Basic Usage

import pandas as pd
from vegz import VegZ

# Initialize VegZ
veg = VegZ()

# Load your vegetation data
data = veg.load_data('vegetation_data.csv')

# Quick diversity analysis
diversity = veg.calculate_diversity(['shannon', 'simpson', 'richness'])

# Multivariate analysis
pca_results = veg.pca_analysis(transform='hellinger')
nmds_results = veg.nmds_analysis(distance_metric='bray_curtis')

# Advanced elbow analysis for optimal clustering
elbow_results = veg.elbow_analysis(
    k_range=range(1, 15),
    methods=['knee_locator', 'derivative', 'variance_explained'],
    plot_results=True
)
optimal_k = elbow_results['recommendations']['consensus']

# Clustering with optimal k
clusters = veg.kmeans_clustering(n_clusters=optimal_k)
indicators = veg.indicator_species_analysis(clusters['cluster_labels'])

# Create visualizations
veg.plot_diversity(diversity, 'shannon')
veg.plot_ordination(pca_results, color_by=clusters['cluster_labels'])

Quick Functions for Immediate Results

from vegz import quick_diversity_analysis, quick_ordination, quick_elbow_analysis

# Instant analyses
diversity = quick_diversity_analysis(data, species_cols=['sp1', 'sp2', 'sp3'])
ordination = quick_ordination(data, method='pca')
elbow_results = quick_elbow_analysis(data, max_k=10, plot_results=True)

Advanced TWINSPAN Analysis

from vegz import VegetationClustering

clustering = VegetationClustering()

# Two-Way Indicator Species Analysis - the gold standard for vegetation classification
twinspan_results = clustering.twinspan(
    species_data,
    cut_levels=[0, 2, 5, 10, 20],
    max_divisions=6,
    min_group_size=5
)

print("Site classification:", twinspan_results['site_classification'])
print("Indicator species:", twinspan_results['classification_tree']['indicator_species'])

Data Format Requirements

VegZ expects data in site-by-species matrix format:

site_id,Species1,Species2,Species3,...
SITE_001,25,18,12,...
SITE_002,32,22,16,...

Environmental data should have matching site IDs:

site_id,latitude,longitude,elevation,soil_ph,temperature,...
SITE_001,44.2619,-72.5806,850,6.2,18.5,...

Target Applications

  • Vegetation community classification and mapping
  • Biodiversity assessments and monitoring
  • Environmental impact studies
  • Species distribution modeling
  • Ecological restoration planning
  • Academic research in plant ecology and environmental science

Requirements

Required:

  • Python >= 3.8
  • NumPy >= 1.21.0
  • Pandas >= 1.3.0
  • SciPy >= 1.7.0
  • Matplotlib >= 3.4.0
  • scikit-learn >= 1.0.0

Optional (for extended functionality):

  • GeoPandas (spatial analysis)
  • PyProj (coordinate transformations)
  • Earth Engine API (remote sensing)
  • FuzzyWuzzy (fuzzy string matching)
  • Plotly/Bokeh (interactive visualizations)

Scientific Background

VegZ implements methods from key ecological and statistical literature:

  • TWINSPAN: Hill, M.O. (1979) TWINSPAN - A FORTRAN Program for Arranging Multivariate Data
  • Elbow Analysis: Multiple algorithms including Satopaa et al. (2011) "Finding a kneedle in a haystack"
  • Ordination: Methods from Legendre & Legendre "Numerical Ecology"
  • Diversity: Comprehensive indices from Magurran "Measuring Biological Diversity"
  • Statistical tests: From Anderson (2001) PERMANOVA and related methods

Contributing

We welcome contributions! Please see the Contributing Guide for details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

  • GitHub Issues: Report bugs or request features
  • Documentation: Full user guide and API reference
  • Email: For academic collaborations and consulting

Citation

If you use VegZ in your research, please cite:

@software{vegz2025,
    author = {Hatim, Mohamed Z.},
    title = {VegZ: A comprehensive Python package for vegetation data analysis and environmental modeling},
    year = {2025},
    version = {1.0.0},
    url = {https://github.com/mhatim99/vegz}
}

VegZ - Empowering ecological research with comprehensive vegetation analysis tools.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vegz-1.0.1.tar.gz (158.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vegz-1.0.1-py3-none-any.whl (143.3 kB view details)

Uploaded Python 3

File details

Details for the file vegz-1.0.1.tar.gz.

File metadata

  • Download URL: vegz-1.0.1.tar.gz
  • Upload date:
  • Size: 158.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for vegz-1.0.1.tar.gz
Algorithm Hash digest
SHA256 bcf1414a699d9fbfe05905a20f94be6ccf12ff02d91284d1fc5960d18cc6b370
MD5 2fb09dd17d92ddd96940e4b1b7273e2e
BLAKE2b-256 79fe0ae6e6d9476a34b3ec1f177b6c2946f33e9ea70c81413819871aa2d1d2ee

See more details on using hashes here.

File details

Details for the file vegz-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: vegz-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 143.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for vegz-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b8837138d5f954017585339ea38e6cc0a76bda0fe2cc99efb7892e79181ca022
MD5 e323b9194d6eb46e23852c0fe128f43d
BLAKE2b-256 ffd6fab9abacd87c854f644b70a246c34ea4534262360bcde07b697f3948485b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page