Skip to main content

A comprehensive Python package for vegetation data analysis and environmental modeling

Project description

VegZ: Comprehensive Vegetation Data Analysis Package

PyPI version Python versions License: MIT

VegZ is a comprehensive, professional-grade Python package designed specifically for vegetation data analysis and environmental modeling. It provides a complete suite of tools for ecologists, environmental scientists, and researchers working with biodiversity and vegetation data.

Complete Feature List

Data Management & Preprocessing

  • Parse vegetation survey data from multiple formats (CSV, Excel, Turboveg)
  • Integration with remote sensing APIs (Landsat, MODIS, Sentinel)
  • Darwin Core biodiversity standards compliance
  • Species name standardization with fuzzy matching
  • Coordinate system transformations
  • Multiple data transformation methods (Hellinger, chord, Wisconsin, log, sqrt, standardize)
  • Automatic species matrix detection
  • Support for heterogeneous data integration

Data Quality & Validation

  • Comprehensive spatial coordinate validation
  • Temporal data validation and date parsing
  • Geographic outlier detection with country boundary checks
  • Coordinate precision assessment
  • Duplicate record identification
  • Invalid coordinate range detection
  • Transposed coordinate detection
  • Country boundary consistency checks
  • Automated quality reporting

Diversity Analysis (15+ Indices)

  • Basic indices: Shannon, Simpson, Simpson inverse, richness, evenness
  • Advanced indices: Fisher's alpha, Berger-Parker, McIntosh, Brillouin
  • Additional indices: Menhinick, Margalef
  • Richness estimators: Chao1, ACE, Jackknife1, Jackknife2
  • Hill numbers for multiple diversity orders (q = 0, 0.5, 1, 1.5, 2, etc.)
  • Beta diversity analysis (Whittaker, Sørensen, Jaccard methods)
  • Rarefaction curves and extrapolation
  • Species accumulation curves
  • Diversity profiles

Complete Multivariate Analysis Suite

  • PCA - Principal Component Analysis with multiple transformations
  • CA - Correspondence Analysis with scaling options
  • DCA - Detrended Correspondence Analysis with segment control
  • CCA - Canonical Correspondence Analysis with constraints
  • RDA - Redundancy Analysis for linear relationships
  • NMDS - Non-metric Multidimensional Scaling with stress assessment
  • PCoA - Principal Coordinates Analysis (metric MDS)
  • Environmental vector fitting to ordination axes
  • Procrustes analysis for ordination comparison
  • Goodness-of-fit diagnostics
  • Multiple ecological distance matrices (Bray-Curtis, Jaccard, Sørensen, Euclidean, Manhattan, Canberra, Chord, Hellinger)

Advanced Clustering Methods

  • TWINSPAN - Two-Way Indicator Species Analysis (vegetation classification gold standard)
    • Pseudospecies creation with customizable cut levels
    • Hierarchical divisive classification
    • Indicator species identification
    • Classification tree structure
  • Hierarchical clustering with ecological distance matrices
  • Comprehensive Elbow Analysis with 5 detection algorithms:
    • Kneedle algorithm (Satopaa et al., 2011) - automatic knee detection
    • Second derivative maximum - curvature-based detection
    • Variance explained threshold - <10% additional variance criterion
    • Distortion jump method (Sugar & James, 2003) - jump detection
    • L-method (Salvador & Chan, 2004) - piecewise linear fitting
  • Consensus recommendations with confidence scores
  • K-means clustering with multiple initializations
  • Fuzzy C-means clustering for gradient boundaries
  • DBSCAN for density-based core community detection
  • Gaussian Mixture Models for probabilistic clustering
  • Clustering validation metrics (silhouette, gap statistic, Calinski-Harabasz, Davies-Bouldin)
  • Optimal k determination with multiple methods

Statistical Analysis

  • PERMANOVA - Permutational multivariate analysis of variance
  • ANOSIM - Analysis of similarities
  • MRPP - Multi-response permutation procedures
  • Mantel tests and partial Mantel tests for matrix correlation
  • Indicator Species Analysis (IndVal) for cluster characterization
  • SIMPER - Similarity percentages for group comparisons
  • Cophenetic correlation for hierarchical clustering validation

Environmental Modeling

  • Generalized Additive Models (GAMs) with multiple smoothers:
    • Spline smoothers
    • LOWESS smoothers
    • Polynomial smoothers
    • Gaussian process smoothers
  • Species response curves modeling:
    • Gaussian response curves
    • Skewed Gaussian curves
    • Beta response curves
    • Linear responses
    • Threshold responses
    • Unimodal responses
  • Environmental gradient analysis
  • Environmental niche modeling

Temporal Analysis

  • Phenology modeling with multiple curve types
  • Trend detection using Mann-Kendall tests
  • Time series decomposition (seasonal, trend, residual)
  • Seasonal pattern analysis
  • Temporal autocorrelation analysis
  • Change point detection

Spatial Analysis

  • Spatial interpolation methods:
    • Inverse Distance Weighting (IDW)
    • Kriging (ordinary, universal)
    • Spline interpolation
  • Landscape metrics calculation:
    • Patch density
    • Edge density
    • Contagion index
    • Shannon diversity index for landscapes
  • Spatial autocorrelation analysis (Moran's I, Geary's C)
  • Point pattern analysis
  • Spatial clustering detection

Specialized Methods

  • Phylogenetic diversity analysis:
    • Faith's phylogenetic diversity
    • Phylogenetic endemism
    • Net Relatedness Index (NRI)
    • Nearest Taxon Index (NTI)
  • Metacommunity analysis:
    • Elements of metacommunity structure
    • Coherence, turnover, and boundary clumping
  • Network analysis:
    • Co-occurrence networks
    • Modularity analysis
    • Network centrality measures
  • Nestedness analysis with null models:
    • NODF (Nestedness based on Overlap and Decreasing Fill)
    • Temperature calculator
    • Null model generation and testing

Functional Trait Analysis

  • Trait syndrome identification
  • Community-weighted means (CWM)
  • Functional diversity indices:
    • Functional richness (FRic)
    • Functional evenness (FEve)
    • Functional divergence (FDiv)
    • Rao's quadratic entropy
  • Trait-environment relationships
  • Fourth-corner analysis

Machine Learning & Predictive Modeling

  • Species Distribution Modeling (SDM):
    • MaxEnt-style modeling
    • Random Forest models
    • Gradient Boosting models
  • Classification algorithms for vegetation types
  • Regression models for abundance prediction
  • Model validation and performance metrics
  • Variable importance assessment
  • Ensemble modeling

Visualization & Reporting

  • Specialized ecological plots:
    • Diversity bar charts and histograms
    • Species accumulation curves
    • Rarefaction plots
  • Ordination diagrams with:
    • Site scores plotting
    • Species loading arrows
    • Environmental vector overlays
    • Convex hulls for groups
    • Stress plots for NMDS
  • Clustering visualizations:
    • Dendrograms with customizable formatting
    • Silhouette plots
    • Comprehensive elbow analysis plots (4-panel layout)
    • Cluster validation plots
  • Interactive dashboards using Plotly/Bokeh
  • Automated quality reports with statistical summaries
  • Export functions (HTML, PDF, PNG, SVG, CSV)

Quick Analysis Functions

  • quick_diversity_analysis() - Instant diversity calculations
  • quick_ordination() - Rapid PCA or NMDS analysis
  • quick_clustering() - Fast k-means or hierarchical clustering
  • quick_elbow_analysis() - Optimal cluster number determination

Quick Start

Installation

pip install VegZ

For extended functionality:

# With spatial analysis support
pip install VegZ[spatial]

# With remote sensing capabilities
pip install VegZ[remote-sensing]

# Complete installation with all features
pip install VegZ[spatial,remote-sensing,fuzzy,interactive]

Basic Usage

import pandas as pd
from VegZ import VegZ

# Initialize VegZ
veg = VegZ()

# Load your vegetation data
data = veg.load_data('vegetation_data.csv')

# Quick diversity analysis
diversity = veg.calculate_diversity(['shannon', 'simpson', 'richness'])

# Multivariate analysis
pca_results = veg.pca_analysis(transform='hellinger')
nmds_results = veg.nmds_analysis(distance_metric='bray_curtis')

# Advanced elbow analysis for optimal clustering
elbow_results = veg.elbow_analysis(
    k_range=range(1, 15),
    methods=['knee_locator', 'derivative', 'variance_explained'],
    plot_results=True
)
optimal_k = elbow_results['recommendations']['consensus']

# Clustering with optimal k
clusters = veg.kmeans_clustering(n_clusters=optimal_k)
indicators = veg.indicator_species_analysis(clusters['cluster_labels'])

# Create visualizations
veg.plot_diversity(diversity, 'shannon')
veg.plot_ordination(pca_results, color_by=clusters['cluster_labels'])

Quick Functions for Immediate Results

from VegZ import quick_diversity_analysis, quick_ordination, quick_elbow_analysis

# Instant analyses
diversity = quick_diversity_analysis(data, species_cols=['sp1', 'sp2', 'sp3'])
ordination = quick_ordination(data, method='pca')
elbow_results = quick_elbow_analysis(data, max_k=10, plot_results=True)

Advanced TWINSPAN Analysis

from VegZ.clustering import VegetationClustering

clustering = VegetationClustering()

# Two-Way Indicator Species Analysis - the gold standard for vegetation classification
twinspan_results = clustering.twinspan(
    species_data,
    cut_levels=[0, 2, 5, 10, 20],
    max_divisions=6,
    min_group_size=5
)

print("Site classification:", twinspan_results['site_classification'])
print("Indicator species:", twinspan_results['classification_tree']['indicator_species'])

Data Format Requirements

VegZ expects data in site-by-species matrix format:

site_id,Species1,Species2,Species3,...
SITE_001,25,18,12,...
SITE_002,32,22,16,...

Environmental data should have matching site IDs:

site_id,latitude,longitude,elevation,soil_ph,temperature,...
SITE_001,44.2619,-72.5806,850,6.2,18.5,...

Target Applications

  • Vegetation community classification and mapping
  • Biodiversity assessments and monitoring
  • Environmental impact studies
  • Species distribution modeling
  • Ecological restoration planning
  • Academic research in plant ecology and environmental science

Requirements

Required:

  • Python >= 3.8
  • NumPy >= 1.21.0
  • Pandas >= 1.3.0
  • SciPy >= 1.7.0
  • Matplotlib >= 3.4.0
  • scikit-learn >= 1.0.0

Optional (for extended functionality):

  • GeoPandas (spatial analysis)
  • PyProj (coordinate transformations)
  • Earth Engine API (remote sensing)
  • FuzzyWuzzy (fuzzy string matching)
  • Plotly/Bokeh (interactive visualizations)

Scientific Background

VegZ implements methods from key ecological and statistical literature:

  • TWINSPAN: Hill, M.O. (1979) TWINSPAN - A FORTRAN Program for Arranging Multivariate Data
  • Elbow Analysis: Multiple algorithms including Satopaa et al. (2011) "Finding a kneedle in a haystack"
  • Ordination: Methods from Legendre & Legendre "Numerical Ecology"
  • Diversity: Comprehensive indices from Magurran "Measuring Biological Diversity"
  • Statistical tests: From Anderson (2001) PERMANOVA and related methods

Contributing

We welcome contributions! Please see the Contributing Guide for details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

  • GitHub Issues: Report bugs or request features
  • Documentation: Full user guide and API reference
  • Email: For academic collaborations and consulting

Citation

If you use VegZ in your research, please cite:

@software{VegZ,
    author = {Hatim, Mohamed Z.},
    title = {VegZ: A comprehensive Python package for vegetation data analysis and environmental modeling},
    year = {2025},
    version = {1.0.2},
    url = {https://github.com/mhatim99/VegZ}
}

VegZ - Empowering ecological research with comprehensive vegetation analysis tools.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vegz-1.0.3.tar.gz (436.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vegz-1.0.3-py3-none-any.whl (143.3 kB view details)

Uploaded Python 3

File details

Details for the file vegz-1.0.3.tar.gz.

File metadata

  • Download URL: vegz-1.0.3.tar.gz
  • Upload date:
  • Size: 436.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for vegz-1.0.3.tar.gz
Algorithm Hash digest
SHA256 446784604ec2110d0c5e997f5c535aa6a87fe7116fdab29d0b5d8b8dff454799
MD5 f63ab7dbfa5bbea0dab95bb61d62876a
BLAKE2b-256 4c2b010d4fa8b4d41754d5d5e846eaa1952cf1ab1b01cf11861e76e8238e57d7

See more details on using hashes here.

File details

Details for the file vegz-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: vegz-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 143.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for vegz-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 65641e88d38ba6036e0d7a3ac12b65c188b2bb0419568aca9cad05d73f5156b9
MD5 ac41cbd4f7f1620ef99468684bafc266
BLAKE2b-256 c218eb2aba7c1bfd7771ceead3e301213a909ad01fe75cc1bf9f6c383ded2976

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page