Skip to main content

Extensible Environmental Data Preprocessing Framework

Project description

EnvDataPrep: High-performance Environmental Data Pre-processing

Python License Development Status

Why EnvDataPrep?

EnvDataPrep saves Money, Time and Disk Storage for those who deal with environmental datasets.

Current Capacity

The first feature is the Unified, Configuration-Driven Extraction of NetCDF Files. Below is an example usage that reduces a TROPOMI satellite product file size by ~90%:

"""Example of extracting a subset of variables from a netCDF file."""

from pathlib import Path
import envdataprep as edp

# Set up input and output directories
ROOT = Path("E:/Samples/Satellites")
input_dir = ROOT / "Input"
output_dir = ROOT / "Output"

# Select an example file
file_name = "S5P_RPRO_L2__NO2____20190101T233659_20190102T011828_06322_03_020400_20221106T093236.nc"
file_path = input_dir / file_name

# List all variables in the file
variables = edp.list_netcdf_variables(file_path)
print(*variables, sep='\n')

# List the variables to extract
variable_paths = [
    "PRODUCT/latitude",
    "PRODUCT/longitude",
    "PRODUCT/nitrogendioxide_tropospheric_column",
    "PRODUCT/SUPPORT_DATA/INPUT_DATA/cloud_fraction_crb",
]

# Extract and save out
# By default, the output file preserves the original group structure
edp.subset_netcdf(
    file_path,
    output_dir,
    variable_paths,
    output_name="example_extracted_data.nc",
    compression_method='zlib',
    compression_level=9,
)

Installation

Prerequisites

  • Mamba (recommended) or Conda
    • Preferred for installing scientific Python dependencies (netCDF4, xarray, numpy)
    • Handles complex dependency resolution more reliably than pip alone
  • Alternative: pip installation should also work, but will be more complicated

Quick Setup

# 1. Get the code
git clone https://github.com/envmini/envdataprep.git
cd envdataprep

# 2. Create environment (choose one)
# Option A: Using Mamba (faster, recommended)
mamba env create -f environment.yml
mamba activate envdataprep

# Option B: Using Conda
# conda env create -f environment.yml
# conda activate envdataprep

# 3. Install package in development mode
pip install -e . --no-deps

# 4. Verify installation
python -c "import envdataprep; print('Installation successful!')"

Why development mode for now?

  • Package not yet published to PyPI/conda-forge
  • Allows you to get latest features and contribute feedback
  • Easy to update with git pull

Note: We use pip install -e . even in conda/mamba environments, but this pip command uses the pip from your active conda environment, not system pip, You can verify this with:

which pip  # Should show: .../miniforge3/envs/envdataprep/bin/pip

License

This project is licensed under the MIT License.

⬆ Back to top

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

envdataprep-0.1.0.tar.gz (17.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

envdataprep-0.1.0-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file envdataprep-0.1.0.tar.gz.

File metadata

  • Download URL: envdataprep-0.1.0.tar.gz
  • Upload date:
  • Size: 17.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for envdataprep-0.1.0.tar.gz
Algorithm Hash digest
SHA256 be8693417a2c76e8c2856693d29136b63e5d8341aa4006d92dd8bc004e07c02a
MD5 5168d41c187ddbb3348930ce6d13a764
BLAKE2b-256 02b460ca6cdaf1b1847c2c681d8150ccf0a3742230024bb2d3a425d59d42e066

See more details on using hashes here.

File details

Details for the file envdataprep-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: envdataprep-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for envdataprep-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ac275507f085d92801da94d9d129f0d6dddd239735533be83cb56432a1fea964
MD5 e4539f5a026ef007ce182fbb2df01507
BLAKE2b-256 ca24cb7be3cd7912574d96eb391bd6d3bf2b685667cfa92890db97a10501109e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page