Skip to main content

Load Alberta Petrinex data (Volumetrics, NGL) into Spark/pandas DataFrames

Project description

Petrinex Python API

Load Alberta Petrinex data (Volumetrics, NGL) into Spark/pandas DataFrames.

PyPI version Downloads Build Status Python 3.8+ License: MIT

Features

  • Databricks Serverless - Full Unity Catalog support
  • Memory Efficient - Handles 100+ files without OOM
  • Zero Config - Automatic ZIP extraction, encoding, error handling
  • Multiple Data Types - Volumetrics (Vol) and NGL support

Note: Currently supports Alberta (AB) jurisdiction only.

Quick Start

pip install petrinex
from petrinex import PetrinexClient

# Volumetrics data
client = PetrinexClient(spark=spark, data_type="Vol")
df = client.read_spark_df(updated_after="2025-12-01")

# NGL and Marketable Gas
ngl_client = PetrinexClient(spark=spark, data_type="NGL")
ngl_df = ngl_client.read_spark_df(updated_after="2025-12-01")

API

Load Data

# Spark DataFrame (recommended)
df = client.read_spark_df(updated_after="2025-12-01")

# pandas DataFrame
pdf = client.read_pandas_df(updated_after="2025-12-01")

# Date range
df = client.read_spark_df(from_date="2021-01-01", end_date="2023-12-31")

Date Parameters:

  • updated_after="2025-12-01" - Files modified after this date
  • from_date="2021-01-01" - All data from production month onwards
  • end_date="2023-12-31" - Optional end date (use with from_date)

Download Files

Download Petrinex files to your local machine. Files are extracted from ZIP and organized in subdirectories by production month:

# Download recent updates
paths = client.download_files(
    output_dir="./petrinex_data",
    updated_after="2025-12-01"
)
# Creates: ./petrinex_data/2025-12/Vol_2025-12.csv

# Historical range
paths = client.download_files(
    output_dir="./data",
    from_date="2021-01-01",
    end_date="2023-12-31"
)

Large Data Loads (Unity Catalog)

For large data loads (20+ files), write directly to Unity Catalog to avoid memory issues and timeouts:

# Write directly to UC table (avoids memory accumulation)
df = client.read_spark_df(
    from_date="2020-01-01",
    uc_table="main.petrinex.volumetrics"
)

# Incremental updates
df = client.read_spark_df(
    updated_after="2025-12-01",
    uc_table="main.petrinex.volumetrics"
)

# Full refresh (truncate first)
spark.sql("TRUNCATE TABLE main.petrinex.volumetrics")
df = client.read_spark_df(from_date="2020-01-01", uc_table="main.petrinex.volumetrics")

Benefits:

  • ✅ No memory accumulation
  • ✅ No Spark Connect timeouts
  • ✅ Automatic schema evolution
  • ✅ Handles 100+ files
  • ✅ Provenance & schema validation

Safety Features:

  • Provenance validation (checks for required columns)
  • Schema validation (ensures compatibility)
  • Schema evolution (adds new columns automatically)
  • Append-only mode (no accidental overwrites)

Databricks

%pip install git+https://github.com/guanjieshen/petrinex-python-api.git

from petrinex import PetrinexClient

client = PetrinexClient(spark=spark, data_type="Vol")
df = client.read_spark_df(updated_after="2025-12-01")
display(df)

See databricks_example.ipynb for complete example.

Data Types

Type Description
Vol Conventional Volumetrics (oil & gas production)
NGL NGL and Marketable Gas Volumes

Installation

# From PyPI
pip install petrinex

# From GitHub
pip install git+https://github.com/guanjieshen/petrinex-python-api.git

# Development
git clone https://github.com/guanjieshen/petrinex-python-api.git
cd petrinex-python-api
pip install -e ".[dev]"

Testing

# Run all tests
pytest tests/ -v

# With coverage
pytest tests/ --cov=petrinex --cov-report=html

# Integration tests (requires network)
pytest tests/ -v -m integration

Links

License

MIT License - Copyright (c) 2026 Guanjie Shen

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

petrinex-1.1.0.tar.gz (20.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

petrinex-1.1.0-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file petrinex-1.1.0.tar.gz.

File metadata

  • Download URL: petrinex-1.1.0.tar.gz
  • Upload date:
  • Size: 20.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for petrinex-1.1.0.tar.gz
Algorithm Hash digest
SHA256 e8f195fa5837e70bd6a29bd1e4e29834fab7919f539b9805a57247896ae7d47a
MD5 487ad8cb47fd5e3458b58e6b6cd1b97d
BLAKE2b-256 2590332f1a3ca9c9641fc182cfbe8bc4cdde16e4461ac578df5a34fc69237c54

See more details on using hashes here.

File details

Details for the file petrinex-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: petrinex-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for petrinex-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 da7187eabba51dd7d0085cbfe474727c92557e45e51e59c12c8de0d3aeff93c8
MD5 6c850f19238e5acd5440945e79a63b04
BLAKE2b-256 6d2dc6766c104ca264e8891efc6a60dd9833cf649735102941b7257a9d1eb367

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page