Load Alberta Petrinex data (Volumetrics, NGL) into Spark/pandas DataFrames
Project description
Petrinex Python API
Load Alberta Petrinex data (Volumetrics, NGL) into Spark/pandas DataFrames.
Features
- ✅ Databricks Serverless - Full Unity Catalog support
- ✅ Memory Efficient - Handles 100+ files without OOM
- ✅ Zero Config - Automatic ZIP extraction, encoding, error handling
- ✅ Multiple Data Types - Volumetrics (Vol) and NGL support
Note: Currently supports Alberta (AB) jurisdiction only.
Quick Start
pip install petrinex
from petrinex import PetrinexClient
# Volumetrics data
client = PetrinexClient(spark=spark, data_type="Vol")
df = client.read_spark_df(updated_after="2025-12-01")
# NGL and Marketable Gas
ngl_client = PetrinexClient(spark=spark, data_type="NGL")
ngl_df = ngl_client.read_spark_df(updated_after="2025-12-01")
API
Load Data
# Spark DataFrame (recommended)
df = client.read_spark_df(updated_after="2025-12-01")
# pandas DataFrame
pdf = client.read_pandas_df(updated_after="2025-12-01")
# Date range
df = client.read_spark_df(from_date="2021-01-01", end_date="2023-12-31")
Date Parameters:
updated_after="2025-12-01"- Files modified after this datefrom_date="2021-01-01"- All data from production month onwardsend_date="2023-12-31"- Optional end date (use withfrom_date)
Download Files
Download Petrinex files to your local machine. Files are extracted from ZIP and organized in subdirectories by production month:
# Download recent updates
paths = client.download_files(
output_dir="./petrinex_data",
updated_after="2025-12-01"
)
# Creates: ./petrinex_data/2025-12/Vol_2025-12.csv
# Historical range
paths = client.download_files(
output_dir="./data",
from_date="2021-01-01",
end_date="2023-12-31"
)
Large Data Loads (Unity Catalog)
For large data loads (20+ files), write directly to Unity Catalog to avoid memory issues and timeouts:
# Write directly to UC table (avoids memory accumulation)
df = client.read_spark_df(
from_date="2020-01-01",
uc_table="main.petrinex.volumetrics"
)
# Incremental updates
df = client.read_spark_df(
updated_after="2025-12-01",
uc_table="main.petrinex.volumetrics"
)
# Full refresh (truncate first)
spark.sql("TRUNCATE TABLE main.petrinex.volumetrics")
df = client.read_spark_df(from_date="2020-01-01", uc_table="main.petrinex.volumetrics")
Benefits:
- ✅ No memory accumulation
- ✅ No Spark Connect timeouts
- ✅ Automatic schema evolution
- ✅ Handles 100+ files
- ✅ Provenance & schema validation
Safety Features:
- Provenance validation (checks for required columns)
- Schema validation (ensures compatibility)
- Schema evolution (adds new columns automatically)
- Append-only mode (no accidental overwrites)
Databricks
%pip install git+https://github.com/guanjieshen/petrinex-python-api.git
from petrinex import PetrinexClient
client = PetrinexClient(spark=spark, data_type="Vol")
df = client.read_spark_df(updated_after="2025-12-01")
display(df)
See databricks_example.ipynb for complete example.
Data Types
| Type | Description |
|---|---|
Vol |
Conventional Volumetrics (oil & gas production) |
NGL |
NGL and Marketable Gas Volumes |
Installation
# From PyPI
pip install petrinex
# From GitHub
pip install git+https://github.com/guanjieshen/petrinex-python-api.git
# Development
git clone https://github.com/guanjieshen/petrinex-python-api.git
cd petrinex-python-api
pip install -e ".[dev]"
Testing
# Run all tests
pytest tests/ -v
# With coverage
pytest tests/ --cov=petrinex --cov-report=html
# Integration tests (requires network)
pytest tests/ -v -m integration
Links
- 📦 PyPI
- 📓 Databricks Example
- 📋 Changelog
- 🧪 Tests
License
MIT License - Copyright (c) 2026 Guanjie Shen
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file petrinex-1.1.0.tar.gz.
File metadata
- Download URL: petrinex-1.1.0.tar.gz
- Upload date:
- Size: 20.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e8f195fa5837e70bd6a29bd1e4e29834fab7919f539b9805a57247896ae7d47a
|
|
| MD5 |
487ad8cb47fd5e3458b58e6b6cd1b97d
|
|
| BLAKE2b-256 |
2590332f1a3ca9c9641fc182cfbe8bc4cdde16e4461ac578df5a34fc69237c54
|
File details
Details for the file petrinex-1.1.0-py3-none-any.whl.
File metadata
- Download URL: petrinex-1.1.0-py3-none-any.whl
- Upload date:
- Size: 13.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da7187eabba51dd7d0085cbfe474727c92557e45e51e59c12c8de0d3aeff93c8
|
|
| MD5 |
6c850f19238e5acd5440945e79a63b04
|
|
| BLAKE2b-256 |
6d2dc6766c104ca264e8891efc6a60dd9833cf649735102941b7257a9d1eb367
|