Extensible Environmental Data Preprocessing Framework
Project description
EnvDataPrep: High-performance Environmental Data Pre-processing
Why EnvDataPrep?
EnvDataPrep saves Money, Time and Disk Storage for those who deal with environmental datasets.
Current Capacity
The first feature is the Unified, Configuration-Driven Extraction of NetCDF Files. Below is an example usage that reduces a TROPOMI satellite product file size by ~90%:
"""Example of extracting a subset of variables from a netCDF file."""
from pathlib import Path
import envdataprep as edp
# Set up input and output directories
ROOT = Path("E:/Samples/Satellites")
input_dir = ROOT / "Input"
output_dir = ROOT / "Output"
# Select an example file
file_name = "S5P_RPRO_L2__NO2____20190101T233659_20190102T011828_06322_03_020400_20221106T093236.nc"
file_path = input_dir / file_name
# List all variables in the file
variables = edp.list_netcdf_variables(file_path)
print(*variables, sep='\n')
# List the variables to extract
variable_paths = [
"PRODUCT/latitude",
"PRODUCT/longitude",
"PRODUCT/nitrogendioxide_tropospheric_column",
"PRODUCT/SUPPORT_DATA/INPUT_DATA/cloud_fraction_crb",
]
# Extract and save out
# By default, the output file preserves the original group structure
edp.subset_netcdf(
file_path,
output_dir,
variable_paths,
output_name="example_extracted_data.nc",
compression_method='zlib',
compression_level=9,
)
Installation
Prerequisites
- Mamba (recommended) or Conda
- Preferred for installing scientific Python dependencies (netCDF4, xarray, numpy)
- Handles complex dependency resolution more reliably than pip alone
- Alternative: pip installation should also work, but will be more complicated
Quick Setup
# 1. Get the code
git clone https://github.com/envmini/envdataprep.git
cd envdataprep
# 2. Create environment (choose one)
# Option A: Using Mamba (faster, recommended)
mamba env create -f environment.yml
mamba activate envdataprep
# Option B: Using Conda
# conda env create -f environment.yml
# conda activate envdataprep
# 3. Install package in development mode
pip install -e . --no-deps
# 4. Verify installation
python -c "import envdataprep; print('Installation successful!')"
Why development mode for now?
- Package not yet published to PyPI/conda-forge
- Allows you to get latest features and contribute feedback
- Easy to update with
git pull
Note: We use pip install -e . even in conda/mamba environments, but this pip command uses the pip from your active conda environment, not system pip, You can verify this with:
which pip # Should show: .../miniforge3/envs/envdataprep/bin/pip
License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file envdataprep-0.1.0.tar.gz.
File metadata
- Download URL: envdataprep-0.1.0.tar.gz
- Upload date:
- Size: 17.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be8693417a2c76e8c2856693d29136b63e5d8341aa4006d92dd8bc004e07c02a
|
|
| MD5 |
5168d41c187ddbb3348930ce6d13a764
|
|
| BLAKE2b-256 |
02b460ca6cdaf1b1847c2c681d8150ccf0a3742230024bb2d3a425d59d42e066
|
File details
Details for the file envdataprep-0.1.0-py3-none-any.whl.
File metadata
- Download URL: envdataprep-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac275507f085d92801da94d9d129f0d6dddd239735533be83cb56432a1fea964
|
|
| MD5 |
e4539f5a026ef007ce182fbb2df01507
|
|
| BLAKE2b-256 |
ca24cb7be3cd7912574d96eb391bd6d3bf2b685667cfa92890db97a10501109e
|