Skip to main content

Load Alberta Petrinex data (Volumetrics, NGL) into Spark/pandas DataFrames

Project description

Petrinex Python API

Load Alberta Petrinex data (Volumetrics, NGL) into Spark/pandas DataFrames.

PyPI version Downloads Build Status Python 3.8+ License: MIT

Features

  • Databricks Serverless - Full Unity Catalog support
  • Memory Efficient - Handles 100+ files without OOM
  • Zero Config - Automatic ZIP extraction, encoding, error handling
  • Multiple Data Types - Volumetrics (Vol) and NGL support

Note: Currently supports Alberta (AB) jurisdiction only.

Quick Start

pip install petrinex
from petrinex import PetrinexClient

# Volumetrics data
client = PetrinexClient(spark=spark, data_type="Vol")
df = client.read_spark_df(updated_after="2025-12-01")

# NGL and Marketable Gas
ngl_client = PetrinexClient(spark=spark, data_type="NGL")
ngl_df = ngl_client.read_spark_df(updated_after="2025-12-01")

API

Load Data

# Spark DataFrame (recommended)
df = client.read_spark_df(updated_after="2025-12-01")

# pandas DataFrame
pdf = client.read_pandas_df(updated_after="2025-12-01")

# Date range
df = client.read_spark_df(from_date="2021-01-01", end_date="2023-12-31")

Date Parameters:

  • updated_after="2025-12-01" - Files modified after this date
  • from_date="2021-01-01" - All data from production month onwards
  • end_date="2023-12-31" - Optional end date (use with from_date)

Download Files

Download Petrinex files to your local machine. Files are extracted from ZIP and organized in subdirectories by production month:

# Download recent updates
paths = client.download_files(
    output_dir="./petrinex_data",
    updated_after="2025-12-01"
)
# Creates: ./petrinex_data/2025-12/Vol_2025-12.csv

# Historical range
paths = client.download_files(
    output_dir="./data",
    from_date="2021-01-01",
    end_date="2023-12-31"
)

Large Data Loads (Unity Catalog)

For large data loads (20+ files), write directly to Unity Catalog to avoid memory issues and timeouts:

# Write directly to UC table (avoids memory accumulation)
df = client.read_spark_df(
    from_date="2020-01-01",
    uc_table="main.petrinex.volumetrics"
)

# Incremental updates
df = client.read_spark_df(
    updated_after="2025-12-01",
    uc_table="main.petrinex.volumetrics"
)

# Full refresh (truncate first)
spark.sql("TRUNCATE TABLE main.petrinex.volumetrics")
df = client.read_spark_df(from_date="2020-01-01", uc_table="main.petrinex.volumetrics")

Benefits:

  • ✅ No memory accumulation
  • ✅ No Spark Connect timeouts
  • ✅ Automatic schema evolution
  • ✅ Handles 100+ files
  • ✅ Provenance & schema validation

Safety Features:

  • Provenance validation (checks for required columns)
  • Schema validation (ensures compatibility)
  • Schema evolution (adds new columns automatically)
  • Append-only mode (no accidental overwrites)

Databricks

%pip install git+https://github.com/guanjieshen/petrinex-python-api.git

from petrinex import PetrinexClient

client = PetrinexClient(spark=spark, data_type="Vol")
df = client.read_spark_df(updated_after="2025-12-01")
display(df)

See databricks_example.ipynb for complete example.

Data Types

Type Description
Vol Conventional Volumetrics (oil & gas production)
NGL NGL and Marketable Gas Volumes

Installation

# From PyPI
pip install petrinex

# From GitHub
pip install git+https://github.com/guanjieshen/petrinex-python-api.git

# Development
git clone https://github.com/guanjieshen/petrinex-python-api.git
cd petrinex-python-api
pip install -e ".[dev]"

Testing

# Run all tests
pytest tests/ -v

# With coverage
pytest tests/ --cov=petrinex --cov-report=html

# Integration tests (requires network)
pytest tests/ -v -m integration

Links

License

MIT License - Copyright (c) 2026 Guanjie Shen

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

petrinex-1.1.1.tar.gz (20.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

petrinex-1.1.1-py3-none-any.whl (14.0 kB view details)

Uploaded Python 3

File details

Details for the file petrinex-1.1.1.tar.gz.

File metadata

  • Download URL: petrinex-1.1.1.tar.gz
  • Upload date:
  • Size: 20.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for petrinex-1.1.1.tar.gz
Algorithm Hash digest
SHA256 6d8dfe9533aae14343695df27f308d0d15afe130d5b9562a592901ad6726e60a
MD5 217ba248ac81b2fa63a9656cbb19bb91
BLAKE2b-256 4c1f362d0fdec0e1eb97eacf55ec1dbaa4a661d2dbc2dd5591a3aea94d62ca4b

See more details on using hashes here.

File details

Details for the file petrinex-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: petrinex-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 14.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for petrinex-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 56e46825b9bfb21a948808853cb608980fdc6ffb633429fcae106e5a97e594cd
MD5 7c34b6ceff31d2ed39870128bea0c951
BLAKE2b-256 8936b9bccfd347bc135ce4e0f6cc2b41b8fad813a397aadae2127f21e0ba10d2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page