Physics-based synthetic well-log benchmark generator

These details have not been verified by PyPI

Project links

Project description

wellbench

Physics-based synthetic well-log benchmark generator for pore-pressure prediction research. Five regions calibrated against real-world wells via Optuna optimisation (Jensen–Shannon divergence + Wasserstein distance against real distributions), a deterministic physics generator, an optional CTGAN baseline, and a CLI that reproduces a 15-dataset benchmark.

Use it to:

Generate reproducible, physically plausible well-log datasets with one line.
Stress-test pore-pressure / petrophysics models against ground truth you control (you set the seed, the depth axis, and the region parameters).
Compare physics-based and GAN-based synthesis under a shared schema.
Run a 15-CSV "benchmark suite" out of the box for paper-ready comparisons.

Install

pip install wellbench
pip install wellbench[ctgan]   # adds the optional CTGAN baseline (torch + ctgan)
pip install wellbench[docs]    # to rebuild the Sphinx docs locally

Python ≥ 3.12 is required. Core deps are numpy, pandas, and scipy. The CTGAN extra pulls in torch and ctgan lazily — they are only imported when you actually use the GAN generator, so the base install stays light.

Quickstart

from wellbench import SyntheticWellLogGenerator, REGION_1

gen = SyntheticWellLogGenerator(REGION_1)
df  = gen.generate(seed=42)
print(df.head())
#    DEPTH         GR         DT      RHOB        RT        HP        OB    DT_NCT       PPP
# 0   500.0  140.2155  138.1432   1.6418   12.7382   645.94   926.85   137.43   615.33
# 1   500.5  142.9120  136.0091   1.6712   13.4711   646.57   928.39   137.39   612.10
# ...

A 60-second tour of every public entry point lives in examples.py:

python examples.py                  # run every example
python examples.py basic ctgan      # pick specific examples

Command-line interface

The packaged wellbench console script reproduces the canonical 15-dataset benchmark — 5 regions × 3 seeds — and writes one CSV per (region, seed) pair:

wellbench                           # all 15 datasets -> ./benchmark/
wellbench -r 2 -s 99 200            # region 2, seeds 99 and 200 (2 CSVs)
wellbench -r 1 2 3                  # only the pore-pressure regions
wellbench -o my_data                # custom output directory
wellbench --help                    # full reference

Output filenames are deterministic: region_<N>_seed_<S>.csv. With the default seeds ([42, 123, 7777]) you get:

benchmark/
├── region_1_seed_42.csv
├── region_1_seed_123.csv
├── region_1_seed_7777.csv
├── region_2_seed_42.csv
...
└── region_5_seed_7777.csv      (15 files, 9 columns each for PP regions)

Writing CSVs from Python

Single well

from wellbench import SyntheticWellLogGenerator, REGION_1

gen = SyntheticWellLogGenerator(REGION_1)
df  = gen.generate(seed=42)
df.to_csv("missa_keswal_seed42.csv", index=False)

One CSV per seed, one folder per region

from pathlib import Path
from wellbench import ALL_REGIONS, SyntheticWellLogGenerator

out = Path("synthetic_wells"); out.mkdir(exist_ok=True)
seeds = [1, 2, 3, 4, 5]

for i, region in enumerate(ALL_REGIONS, start=1):
    region_dir = out / f"region_{i}"
    region_dir.mkdir(exist_ok=True)
    gen = SyntheticWellLogGenerator(region)
    for seed in seeds:
        gen.generate(seed=seed).to_csv(
            region_dir / f"seed_{seed}.csv", index=False
        )

Reproduce the full 15-dataset benchmark programmatically

from wellbench import generate_benchmark

paths = generate_benchmark(output_dir="benchmark")
# returns the list of 15 written CSV paths

Match a real well's depth axis

When you want one synthetic row per real measurement (e.g. for side-by-side log plots), pass an explicit depth array:

import pandas as pd
from wellbench import SyntheticWellLogGenerator, REGION_1

real  = pd.read_csv("real_well.csv", usecols=["DEPTH"])
synth = SyntheticWellLogGenerator(REGION_1).generate(
    seed=42, depth=real["DEPTH"].to_numpy(),
)
synth.to_csv("synthetic_aligned.csv", index=False)

Custom depth ranges or sampling rates

import numpy as np
from wellbench import SyntheticWellLogGenerator, REGION_4

depth = np.linspace(120, 700, 5_000)         # 5 000 samples in 120-700 m
df = SyntheticWellLogGenerator(REGION_4).generate(seed=7, depth=depth)

Cleaning real or synthetic data

clean_well_data applies the same physical-bounds + outlier rules to any DataFrame that has a DEPTH column plus log columns:

import pandas as pd
from wellbench import clean_well_data

raw     = pd.read_csv("real_well.csv")
cleaned = clean_well_data(
    raw,
    outlier_std=5,        # drop values further than 5σ from the mean
    label="real_A",       # tag for the printed summary
    verbose=True,
)
cleaned.to_csv("real_well_clean.csv", index=False)

It will:

Drop the SPHI column if present.
Replace sentinel values (-999, -999.25) with NaN.
Replace out-of-physical-range values with NaN.
Replace > outlier_std-σ outliers with NaN.
Drop rows where every log column is NaN.

CTGAN baseline (optional)

Five pre-trained CTGAN models — one per region — ship inside the wheel and are loaded lazily. Install the extra:

pip install wellbench[ctgan]

…then sample with the same .generate(seed, depth=…) interface as the physics generator:

from wellbench import load_ctgan_generator

gen = load_ctgan_generator(region_index=1)        # ctgan_r1.pkl
df  = gen.generate(seed=42)
df.to_csv("ctgan_region1_seed42.csv", index=False)

You can also point it at your own checkpoint:

from wellbench import CTGANSyntheticWellLogGenerator, REGION_1

gen = CTGANSyntheticWellLogGenerator(
    params=REGION_1,
    model_path="my_models/ctgan_custom.pkl",
)
df = gen.generate(seed=0, depth=my_depth_array)

CTGAN samples are i.i.d. tabular rows; wellbench orders them along the depth axis you supply, applies _CTGAN_COLUMN_RENAMES so the output schema matches the physics generator, and clips to PHYSICAL_BOUNDS.

Defining your own region

A region is just a dict of physical parameters. The simplest recipe is to copy a built-in and tweak:

from wellbench import REGION_1, SyntheticWellLogGenerator

my_region = {
    **REGION_1,
    "name":         "My field",
    "depth_range": (1_000, 3_000),
    "depth_step":   0.25,
    "depth_unit":   "ft",
    "phi0":         0.42,           # surface porosity
    "compaction_c": 0.0006,         # Athy's exponential decay
    # …override anything else
}
df = SyntheticWellLogGenerator(my_region).generate(seed=42)

Required keys cover porosity (phi0, compaction_c, phi_layer_amp, …), Wyllie/Archie/RHOB/GR transforms, and — if has_pore_pressure=True — the hydrostatic / overburden / Eaton parameters. See src/regions.py for five fully worked examples.

Plotting recipe

import matplotlib.pyplot as plt
from wellbench import SyntheticWellLogGenerator, REGION_1

df = SyntheticWellLogGenerator(REGION_1).generate(seed=42)

fig, axes = plt.subplots(1, 4, figsize=(12, 8), sharey=True)
for ax, col in zip(axes, ["GR", "DT", "RHOB", "RT"]):
    ax.plot(df[col], df["DEPTH"])
    ax.set_xlabel(col); ax.invert_yaxis()
axes[3].set_xscale("log")            # RT is plotted on a log axis by convention
axes[0].set_ylabel("Depth")
fig.tight_layout()
fig.savefig("logs.png", dpi=150)

Regions

#	Region	Location	Pore pressure
1	Missa Keswal (Zone 1)	Eastern Potwar Basin, Punjab, Pakistan	yes
2	PINDORI-1 (Zone 2)	Eastern Potwar Basin, Punjab, Pakistan	yes
3	JOYAMAIR-4 / MINWAL-2 (Zone 3)	Eastern Potwar Basin, Punjab, Pakistan	yes
4	IODP Expedition 323, Hole U1343E	Bering Sea	no
5	Volve oil field	North Sea (Norway/UK)	no

Convenience constants: ALL_REGIONS (list of all five in order) and BENCHMARK_SEEDS ([42, 123, 7777]).

Output schema

Column	Always present	PP regions only	Units / range
`DEPTH`	✅		depends on region (`ft` or `m`)
`GR`	✅		API, `[0, 200]`
`DT`	✅		µs/ft, `[30, 180]`
`RHOB`	✅		g/cc, `[1.2, 2.9]`
`RT`	✅		Ω·m, `[0.01, 10 000]`
`HP`		✅	psi, hydrostatic pressure
`OB`		✅	psi, overburden
`DT_NCT`		✅	µs/ft, normal compaction trend
`PPP`		✅	psi, pore pressure (Eaton)

All outputs are clipped to wellbench.PHYSICAL_BOUNDS so consumers can rely on a fixed physical range. Inspect the bounds at runtime:

from wellbench import PHYSICAL_BOUNDS
for col, (lo, hi) in PHYSICAL_BOUNDS.items():
    print(f"{col:<7} [{lo}, {hi}]")

Physics models

Porosity — exponential compaction (Athy's law) + layered sinusoids + Gaussian noise.
Sonic (DT) — Wyllie time-average equation.
Density (RHOB) — bulk density mixing law with a small lithology trend.
Resistivity (RT) — Archie's equation.
Gamma ray (GR) — shale-volume linear mixing.
Pore pressure (PPP) — Eaton's method on a normal compaction trend (regions 1-3 only).

Reproducibility

Every generator method takes a seed. The same (region, seed, depth) triple is guaranteed to produce identical output across runs and platforms.
The CTGAN sampler also seeds NumPy and PyTorch (CPU and CUDA) before each .generate() call.
Region parameters are frozen dictionaries; if you want to record exactly which calibration produced a CSV, just dump the region dict alongside it.

Documentation

Full Sphinx docs (autodoc + Napoleon) live under docs/. Build them locally with:

pip install wellbench[docs]
cd docs
make html             # POSIX
.\make.bat html       # Windows
# open _build/html/index.html

Public API at a glance

from wellbench import (
    # Physics generator
    SyntheticWellLogGenerator,
    PHYSICAL_BOUNDS,
    SENTINEL_VALUES,
    clean_well_data,

    # CTGAN generator (needs the [ctgan] extra at runtime)
    CTGANSyntheticWellLogGenerator,
    load_ctgan_generator,

    # Benchmark
    generate_benchmark,

    # Region presets
    ALL_REGIONS, BENCHMARK_SEEDS,
    REGION_1, REGION_2, REGION_3, REGION_4, REGION_5,
)

Citing

If wellbench helps your research, please cite the underlying physics-based synthetic-data study (see docs/ for the current reference) and link back to this package.

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.2

Apr 27, 2026

0.1.1

Apr 27, 2026

0.1.0

Apr 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wellbench-0.1.2.tar.gz (2.5 MB view details)

Uploaded Apr 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

wellbench-0.1.2-py3-none-any.whl (2.5 MB view details)

Uploaded Apr 27, 2026 Python 3

File details

Details for the file wellbench-0.1.2.tar.gz.

File metadata

Download URL: wellbench-0.1.2.tar.gz
Upload date: Apr 27, 2026
Size: 2.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for wellbench-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`723ba1de1a8c5277489596f096b0303edb9fcf303d01fae1c9e2071d92594c18`
MD5	`0a3ae0d5a32b1dcb55f68bf05cbcc70b`
BLAKE2b-256	`7999743aa6838779a5a12c3c8ef7000ccbbebf92ab3cb032a6f147e6a73a00ec`

See more details on using hashes here.

File details

Details for the file wellbench-0.1.2-py3-none-any.whl.

File metadata

Download URL: wellbench-0.1.2-py3-none-any.whl
Upload date: Apr 27, 2026
Size: 2.5 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for wellbench-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b84d210e93f24eee025b0597949072163e0498c61e9caef0dffa0936dae88c0e`
MD5	`367e84d58a6cd5e7437b90f97efd00cb`
BLAKE2b-256	`21885b804c66b492d134a3e0f6516d2dd77e4ee0d643d27401a928f011d11b62`

See more details on using hashes here.

wellbench 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

wellbench

Install

Quickstart

Command-line interface

Writing CSVs from Python

Single well

One CSV per seed, one folder per region

Reproduce the full 15-dataset benchmark programmatically

Match a real well's depth axis

Custom depth ranges or sampling rates

Cleaning real or synthetic data

CTGAN baseline (optional)

Defining your own region

Plotting recipe

Regions

Output schema

Physics models

Reproducibility

Documentation

Public API at a glance

Citing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes