Skip to main content

Extensible Environmental Data Preprocessing Framework

Project description

EnvDataPrep: High-performance Environmental Data Pre-processing

Python License Development Status

Why EnvDataPrep?

EnvDataPrep saves Money, Time and Disk Storage for those who deal with environmental datasets.

Current Capacity

The first feature is the Unified, Configuration-Driven Extraction of NetCDF Files. Below is an example usage that reduces a TROPOMI satellite product file size by ~90%:

"""Example of extracting a subset of variables from a netCDF file."""

from pathlib import Path
import envdataprep as edp

# Set up input and output directories
ROOT = Path("E:/Samples/Satellites")
input_dir = ROOT / "Input"
output_dir = ROOT / "Output"

# Select an example file
file_name = "S5P_RPRO_L2__NO2____20190101T233659_20190102T011828_06322_03_020400_20221106T093236.nc"
file_path = input_dir / file_name

# List all variables in the file
variables = edp.list_netcdf_variables(file_path)
print(*variables, sep='\n')

# List the variables to extract
variable_paths = [
    "PRODUCT/latitude",
    "PRODUCT/longitude",
    "PRODUCT/nitrogendioxide_tropospheric_column",
    "PRODUCT/SUPPORT_DATA/INPUT_DATA/cloud_fraction_crb",
]

# Extract and save out
# By default, the output file preserves the original group structure
edp.subset_netcdf(
    file_path,
    output_dir,
    variable_paths,
    output_name="example_extracted_data.nc",
    compression_method='zlib',
    compression_level=9,
)

Installation

Prerequisites

  • Mamba (recommended) or Conda
    • Preferred for installing scientific Python dependencies (netCDF4, xarray, numpy)
    • Handles complex dependency resolution more reliably than pip alone
  • Alternative: pip installation should also work, but will be more complicated

Quick Setup

# 1. Get the code
git clone https://github.com/envmini/envdataprep.git
cd envdataprep

# 2. Create environment (choose one)
# Option A: Using Mamba (faster, recommended)
mamba env create -f environment.yml
mamba activate envdataprep

# Option B: Using Conda
# conda env create -f environment.yml
# conda activate envdataprep

# 3. Install package in development mode
pip install -e . --no-deps

# 4. Verify installation
python -c "import envdataprep; print('Installation successful!')"

Why development mode for now?

  • Package not yet published to PyPI/conda-forge
  • Allows you to get latest features and contribute feedback
  • Easy to update with git pull

Note: We use pip install -e . even in conda/mamba environments, but this pip command uses the pip from your active conda environment, not system pip, You can verify this with:

which pip  # Should show: .../miniforge3/envs/envdataprep/bin/pip

License

This project is licensed under the MIT License.

⬆ Back to top

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

envdataprep-0.1.1.tar.gz (17.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

envdataprep-0.1.1-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file envdataprep-0.1.1.tar.gz.

File metadata

  • Download URL: envdataprep-0.1.1.tar.gz
  • Upload date:
  • Size: 17.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for envdataprep-0.1.1.tar.gz
Algorithm Hash digest
SHA256 7747bc3ee513bafda692559ec4e78f555463b2a30a9129e0a2e14f0939bf8666
MD5 010df599d44704adc27df684bef60099
BLAKE2b-256 4998de90349261cdc0e02a7a17b940a19010e4b2a7d94dd884defb6691c97a77

See more details on using hashes here.

File details

Details for the file envdataprep-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: envdataprep-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 17.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for envdataprep-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f1098700c2c286fea0b293b3e18f0e23df2c16f61841f82ffd0b91120f569228
MD5 b327537b201495553e742cc3dbc049ac
BLAKE2b-256 cf882c7b5d41b17cb31d2cc3cb3dd07bb2211fb7280e546c4b2deb5ac8b49d31

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page