Physics-based synthetic well-log benchmark generator
Project description
wellbench
Physics-based synthetic well-log benchmark generator for pore-pressure prediction research. Five regions calibrated against real-world wells via Optuna optimisation (Jensen–Shannon divergence + Wasserstein distance against real distributions), a deterministic physics generator, an optional CTGAN baseline, and a CLI that reproduces a 15-dataset benchmark.
Use it to:
- Generate reproducible, physically plausible well-log datasets with one line.
- Stress-test pore-pressure / petrophysics models against ground truth you control (you set the seed, the depth axis, and the region parameters).
- Compare physics-based and GAN-based synthesis under a shared schema.
- Run a 15-CSV "benchmark suite" out of the box for paper-ready comparisons.
Install
pip install wellbench
pip install wellbench[ctgan] # adds the optional CTGAN baseline (torch + ctgan)
pip install wellbench[docs] # to rebuild the Sphinx docs locally
Python ≥ 3.12 is required. Core deps are numpy, pandas, and scipy.
The CTGAN extra pulls in torch and ctgan lazily — they are only imported
when you actually use the GAN generator, so the base install stays light.
Quickstart
from wellbench import SyntheticWellLogGenerator, REGION_1
gen = SyntheticWellLogGenerator(REGION_1)
df = gen.generate(seed=42)
print(df.head())
# DEPTH GR DT RHOB RT HP OB DT_NCT PPP
# 0 500.0 140.2155 138.1432 1.6418 12.7382 645.94 926.85 137.43 615.33
# 1 500.5 142.9120 136.0091 1.6712 13.4711 646.57 928.39 137.39 612.10
# ...
A 60-second tour of every public entry point lives in examples.py:
python examples.py # run every example
python examples.py basic ctgan # pick specific examples
Command-line interface
The packaged wellbench console script reproduces the canonical 15-dataset
benchmark — 5 regions × 3 seeds — and writes one CSV per (region, seed) pair:
wellbench # all 15 datasets -> ./benchmark/
wellbench -r 2 -s 99 200 # region 2, seeds 99 and 200 (2 CSVs)
wellbench -r 1 2 3 # only the pore-pressure regions
wellbench -o my_data # custom output directory
wellbench --help # full reference
Output filenames are deterministic: region_<N>_seed_<S>.csv. With the
default seeds ([42, 123, 7777]) you get:
benchmark/
├── region_1_seed_42.csv
├── region_1_seed_123.csv
├── region_1_seed_7777.csv
├── region_2_seed_42.csv
...
└── region_5_seed_7777.csv (15 files, 9 columns each for PP regions)
Writing CSVs from Python
Single well
from wellbench import SyntheticWellLogGenerator, REGION_1
gen = SyntheticWellLogGenerator(REGION_1)
df = gen.generate(seed=42)
df.to_csv("missa_keswal_seed42.csv", index=False)
One CSV per seed, one folder per region
from pathlib import Path
from wellbench import ALL_REGIONS, SyntheticWellLogGenerator
out = Path("synthetic_wells"); out.mkdir(exist_ok=True)
seeds = [1, 2, 3, 4, 5]
for i, region in enumerate(ALL_REGIONS, start=1):
region_dir = out / f"region_{i}"
region_dir.mkdir(exist_ok=True)
gen = SyntheticWellLogGenerator(region)
for seed in seeds:
gen.generate(seed=seed).to_csv(
region_dir / f"seed_{seed}.csv", index=False
)
Reproduce the full 15-dataset benchmark programmatically
from wellbench import generate_benchmark
paths = generate_benchmark(output_dir="benchmark")
# returns the list of 15 written CSV paths
Match a real well's depth axis
When you want one synthetic row per real measurement (e.g. for side-by-side
log plots), pass an explicit depth array:
import pandas as pd
from wellbench import SyntheticWellLogGenerator, REGION_1
real = pd.read_csv("real_well.csv", usecols=["DEPTH"])
synth = SyntheticWellLogGenerator(REGION_1).generate(
seed=42, depth=real["DEPTH"].to_numpy(),
)
synth.to_csv("synthetic_aligned.csv", index=False)
Custom depth ranges or sampling rates
import numpy as np
from wellbench import SyntheticWellLogGenerator, REGION_4
depth = np.linspace(120, 700, 5_000) # 5 000 samples in 120-700 m
df = SyntheticWellLogGenerator(REGION_4).generate(seed=7, depth=depth)
Cleaning real or synthetic data
clean_well_data applies the same physical-bounds + outlier rules to any
DataFrame that has a DEPTH column plus log columns:
import pandas as pd
from wellbench import clean_well_data
raw = pd.read_csv("real_well.csv")
cleaned = clean_well_data(
raw,
outlier_std=5, # drop values further than 5σ from the mean
label="real_A", # tag for the printed summary
verbose=True,
)
cleaned.to_csv("real_well_clean.csv", index=False)
It will:
- Drop the
SPHIcolumn if present. - Replace sentinel values (
-999,-999.25) withNaN. - Replace out-of-physical-range values with
NaN. - Replace >
outlier_std-σ outliers withNaN. - Drop rows where every log column is
NaN.
CTGAN baseline (optional)
Five pre-trained CTGAN models — one per region — ship inside the wheel and are loaded lazily. Install the extra:
pip install wellbench[ctgan]
…then sample with the same .generate(seed, depth=…) interface as the
physics generator:
from wellbench import load_ctgan_generator
gen = load_ctgan_generator(region_index=1) # ctgan_r1.pkl
df = gen.generate(seed=42)
df.to_csv("ctgan_region1_seed42.csv", index=False)
You can also point it at your own checkpoint:
from wellbench import CTGANSyntheticWellLogGenerator, REGION_1
gen = CTGANSyntheticWellLogGenerator(
params=REGION_1,
model_path="my_models/ctgan_custom.pkl",
)
df = gen.generate(seed=0, depth=my_depth_array)
CTGAN samples are i.i.d. tabular rows; wellbench orders them along the
depth axis you supply, applies _CTGAN_COLUMN_RENAMES so the output schema
matches the physics generator, and clips to PHYSICAL_BOUNDS.
Defining your own region
A region is just a dict of physical parameters. The simplest recipe is to
copy a built-in and tweak:
from wellbench import REGION_1, SyntheticWellLogGenerator
my_region = {
**REGION_1,
"name": "My field",
"depth_range": (1_000, 3_000),
"depth_step": 0.25,
"depth_unit": "ft",
"phi0": 0.42, # surface porosity
"compaction_c": 0.0006, # Athy's exponential decay
# …override anything else
}
df = SyntheticWellLogGenerator(my_region).generate(seed=42)
Required keys cover porosity (phi0, compaction_c, phi_layer_amp, …),
Wyllie/Archie/RHOB/GR transforms, and — if has_pore_pressure=True — the
hydrostatic / overburden / Eaton parameters. See
src/regions.py for five fully worked examples.
Plotting recipe
import matplotlib.pyplot as plt
from wellbench import SyntheticWellLogGenerator, REGION_1
df = SyntheticWellLogGenerator(REGION_1).generate(seed=42)
fig, axes = plt.subplots(1, 4, figsize=(12, 8), sharey=True)
for ax, col in zip(axes, ["GR", "DT", "RHOB", "RT"]):
ax.plot(df[col], df["DEPTH"])
ax.set_xlabel(col); ax.invert_yaxis()
axes[3].set_xscale("log") # RT is plotted on a log axis by convention
axes[0].set_ylabel("Depth")
fig.tight_layout()
fig.savefig("logs.png", dpi=150)
Regions
| # | Region | Location | Pore pressure |
|---|---|---|---|
| 1 | Missa Keswal (Zone 1) | Eastern Potwar Basin, Punjab, Pakistan | yes |
| 2 | PINDORI-1 (Zone 2) | Eastern Potwar Basin, Punjab, Pakistan | yes |
| 3 | JOYAMAIR-4 / MINWAL-2 (Zone 3) | Eastern Potwar Basin, Punjab, Pakistan | yes |
| 4 | IODP Expedition 323, Hole U1343E | Bering Sea | no |
| 5 | Volve oil field | North Sea (Norway/UK) | no |
Convenience constants: ALL_REGIONS (list of all five in order) and
BENCHMARK_SEEDS ([42, 123, 7777]).
Output schema
| Column | Always present | PP regions only | Units / range |
|---|---|---|---|
DEPTH |
✅ | depends on region (ft or m) |
|
GR |
✅ | API, [0, 200] |
|
DT |
✅ | µs/ft, [30, 180] |
|
RHOB |
✅ | g/cc, [1.2, 2.9] |
|
RT |
✅ | Ω·m, [0.01, 10 000] |
|
HP |
✅ | psi, hydrostatic pressure | |
OB |
✅ | psi, overburden | |
DT_NCT |
✅ | µs/ft, normal compaction trend | |
PPP |
✅ | psi, pore pressure (Eaton) |
All outputs are clipped to wellbench.PHYSICAL_BOUNDS so consumers can rely
on a fixed physical range. Inspect the bounds at runtime:
from wellbench import PHYSICAL_BOUNDS
for col, (lo, hi) in PHYSICAL_BOUNDS.items():
print(f"{col:<7} [{lo}, {hi}]")
Physics models
- Porosity — exponential compaction (Athy's law) + layered sinusoids + Gaussian noise.
- Sonic (DT) — Wyllie time-average equation.
- Density (RHOB) — bulk density mixing law with a small lithology trend.
- Resistivity (RT) — Archie's equation.
- Gamma ray (GR) — shale-volume linear mixing.
- Pore pressure (PPP) — Eaton's method on a normal compaction trend (regions 1-3 only).
Reproducibility
- Every generator method takes a
seed. The same(region, seed, depth)triple is guaranteed to produce identical output across runs and platforms. - The CTGAN sampler also seeds NumPy and PyTorch (CPU and CUDA) before each
.generate()call. - Region parameters are frozen dictionaries; if you want to record exactly which calibration produced a CSV, just dump the region dict alongside it.
Documentation
Full Sphinx docs (autodoc + Napoleon) live under docs/. Build them
locally with:
pip install wellbench[docs]
cd docs
make html # POSIX
.\make.bat html # Windows
# open _build/html/index.html
Public API at a glance
from wellbench import (
# Physics generator
SyntheticWellLogGenerator,
PHYSICAL_BOUNDS,
SENTINEL_VALUES,
clean_well_data,
# CTGAN generator (needs the [ctgan] extra at runtime)
CTGANSyntheticWellLogGenerator,
load_ctgan_generator,
# Benchmark
generate_benchmark,
# Region presets
ALL_REGIONS, BENCHMARK_SEEDS,
REGION_1, REGION_2, REGION_3, REGION_4, REGION_5,
)
Citing
If wellbench helps your research, please cite the underlying
physics-based synthetic-data study (see docs/ for the current
reference) and link back to this package.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wellbench-0.1.2.tar.gz.
File metadata
- Download URL: wellbench-0.1.2.tar.gz
- Upload date:
- Size: 2.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
723ba1de1a8c5277489596f096b0303edb9fcf303d01fae1c9e2071d92594c18
|
|
| MD5 |
0a3ae0d5a32b1dcb55f68bf05cbcc70b
|
|
| BLAKE2b-256 |
7999743aa6838779a5a12c3c8ef7000ccbbebf92ab3cb032a6f147e6a73a00ec
|
File details
Details for the file wellbench-0.1.2-py3-none-any.whl.
File metadata
- Download URL: wellbench-0.1.2-py3-none-any.whl
- Upload date:
- Size: 2.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b84d210e93f24eee025b0597949072163e0498c61e9caef0dffa0936dae88c0e
|
|
| MD5 |
367e84d58a6cd5e7437b90f97efd00cb
|
|
| BLAKE2b-256 |
21885b804c66b492d134a3e0f6516d2dd77e4ee0d643d27401a928f011d11b62
|