Skip to main content

Local emulator for Google BigQuery — REST + gRPC (Storage Read/Write) APIs, DuckDB-backed, SQLGlot-powered.

Project description

bqemulator

A local, drop-in emulator for Google BigQuery.

DuckDB-backed, SQLGlot-powered, and tested against the real service. Point the official Google Cloud client libraries at it and run your BigQuery code on your laptop or in CI — no real project, no billing, no network.

CI E2E Conformance Docs PyPI Python License Ruff Checked with mypy

Documentation · Quickstart · Examples · Compatibility matrix · Changelog


Why bqemulator?

Testing code against real BigQuery is slow (network + service latency), expensive (every query is billable), and dangerous (no rollback in shared environments). The alternatives — mocks, fakes, and shared sandboxes — drift from the real service the moment you stop chasing them.

bqemulator is a process you can run locally that speaks BigQuery's actual wire protocol (REST + gRPC), backs onto a real analytical SQL engine (DuckDB), and translates GoogleSQL → DuckDB SQL with a rule-based, ADR-grounded translator (SQLGlot). The official google-cloud-bigquery, @google-cloud/bigquery, cloud.google.com/go/bigquery, com.google.cloud:google-cloud-bigquery, and bq CLI clients all work against it unchanged — only the endpoint differs.

Three use cases, one binary:

  • Ephemeral CI fixturepytest plugin starts an in-process emulator on a random port; pip install bqemulator[testing] is all the wiring you need.
  • Long-running local dev serverbqemulator start --data-dir ~/bqemu persists state across runs; works with the official bq CLI, dbt, Airflow, PySpark, Beam, Scio.
  • Offline replica of a real projectbqemulator import --from-project <id> clones schema (and optionally data) from real BigQuery into a local data directory.

Highlights

  • 🟢 Full REST + gRPC API parity — Datasets, Tables, Jobs, TableData, Routines, Row Access Policies, Authorized Views, plus Models CRUD metadata. Storage Read API (Arrow and Avro). Storage Write API (all four stream types — DEFAULT, COMMITTED, PENDING, BUFFERED — with both proto and Arrow row formats).
  • Real SQL — GoogleSQL translated to DuckDB SQL via 92 SQLGlot rules + 22 rewriters; covers date/time, string, array, struct, range, geography, JSON, approximate-aggregate, statistical, regex, civil-time, and bit operations.
  • 🧠 Features goccy/bigquery-emulator doesn't have — JavaScript UDFs (embedded V8 via mini-racer), procedural scripting (DECLARE / BEGIN…END / IF / LOOP / EXCEPTION / BEGIN TRANSACTION), time travel (FOR SYSTEM_TIME AS OF), table snapshots, table clones, materialized views with refresh dispatch, GEOGRAPHY (planar via DuckDB-spatial + S2 helpers), RANGE, INTERVAL, authorized views, row-access policies, INFORMATION_SCHEMA.
  • 🔌 Five-client e2e matrix — every release is exercised against the official Python, Node.js, Go, and Java BigQuery client libraries plus Google's bq CLI in a live Docker container.
  • 🧪 7-tier test pyramid — unit + property + integration + conformance + e2e + perf + chaos, plus mutation / fuzz / differential siblings. Combined coverage is gated at ≥90% line + branch.
  • 📐 Conformance corpus — 1,141 fixtures recorded against real BigQuery. Drift between the emulator and the real service surfaces as a failing test; documented divergences are pinned with ADR references.
  • 🐍 Native pytest pluginpip install bqemulator registers a pytest plugin; the bqemu_server fixture starts an ephemeral in-process emulator on random free ports and sets BIGQUERY_EMULATOR_HOST. No conftest.py wiring required.
  • 🐳 Multi-arch containerghcr.io/jjviscomi/bqemulator builds for linux/amd64 + linux/arm64, with cosign keyless signatures via GitHub OIDC.
  • 🔭 Production-grade observabilitystructlog JSON logs, OpenTelemetry tracing (configurable OTLP exporter), Prometheus metrics endpoint.

Install

pip install bqemulator

Optional extras:

pip install "bqemulator[testing]"      # pytest, hypothesis, testcontainers, bigquery client
pip install "bqemulator[udf-js]"       # JavaScript UDF support (embedded V8)
pip install "bqemulator[orc]"          # ORC format for load jobs
pip install "bqemulator[compression]"  # zstd + snappy for load/extract jobs
pip install "bqemulator[import]"       # bqemulator import --from-project
pip install "bqemulator[all]"          # all runtime extras (no testing extras)

Docker:

docker run --rm -p 9050:9050 -p 9060:9060 ghcr.io/jjviscomi/bqemulator:latest

Both pip and the published image bundle the same emulator. The image exposes REST on 9050 and gRPC on 9060 by default — see configuration reference to change them.

Windows users: install Docker Desktop for Windows with the WSL2 backend (default since Docker Desktop 4.x); the published Linux image runs natively under WSL2 with no Windows-specific configuration. Native Windows-container variants of the image are explicitly out of scope for v1.0 — see docs/reference/out-of-scope.md#native-windows-containers for the rationale.

Quickstart

Python

import os
from google.cloud import bigquery

# Either set BIGQUERY_EMULATOR_HOST (picked up by every Google Cloud library)
# or pass api_endpoint explicitly to the Client. Both work.
os.environ["BIGQUERY_EMULATOR_HOST"] = "localhost:9050"

client = bigquery.Client(project="my-test-project")

client.create_dataset("sales")
client.create_table(
    bigquery.Table(
        "sales.orders",
        schema=[
            bigquery.SchemaField("id", "INT64"),
            bigquery.SchemaField("amount", "NUMERIC"),
            bigquery.SchemaField("placed_at", "TIMESTAMP"),
        ],
    )
)
client.insert_rows_json(
    "sales.orders",
    [{"id": 1, "amount": "12.50", "placed_at": "2026-05-21T00:00:00Z"}],
)

for row in client.query("SELECT COUNT(*) AS n FROM sales.orders").result():
    print(row.n)  # 1

pytest

bqemulator ships a pytest plugin via the pytest11 entry point. Installing the package is all the wiring you need — your conftest.py stays empty.

from google.cloud import bigquery

def test_orders_table(bqemu_client: bigquery.Client) -> None:
    bqemu_client.create_dataset("sales")
    # ... your test ...

The bqemu_server fixture is session-scoped (one emulator per test session); the bqemu_client fixture is function-scoped and returns a pre-configured bigquery.Client. See the pytest fixture guide and the python/pytest-integration example for a complete Flask app with integration tests.

Node.js

const { BigQuery } = require('@google-cloud/bigquery');

const bq = new BigQuery({
  projectId: 'my-test-project',
  apiEndpoint: 'http://localhost:9050',
  token: 'dummy',  // emulator accepts any token
});

await bq.createDataset('sales');

See the Node.js quickstart and the nodejs/nestjs-app example.

Go

client, _ := bigquery.NewClient(
    ctx, "my-test-project",
    option.WithEndpoint("http://localhost:9050"),
    option.WithoutAuthentication(),
)

See the Go quickstart and the go/beam-pipeline example.

Java

BigQuery bq = BigQueryOptions.newBuilder()
    .setProjectId("my-test-project")
    .setHost("http://localhost:9050")
    .setCredentials(NoCredentials.getInstance())
    .build()
    .getService();

See the Java quickstart and the java/spring-boot example.

bq CLI

bq --api=http://localhost:9050 \
   --project_id=my-test-project \
   query --use_legacy_sql=false 'SELECT 1 AS n'

See the bq CLI guide and the bq-cli-quickstart example.

docker-compose

services:
  bqemulator:
    image: ghcr.io/jjviscomi/bqemulator:latest
    ports: ["9050:9050", "9060:9060"]
    healthcheck:
      test: ["CMD", "curl", "-sf", "http://localhost:9050/healthz"]
      interval: 2s
      retries: 30

  app:
    build: .
    environment:
      BIGQUERY_EMULATOR_HOST: bqemulator:9050
    depends_on:
      bqemulator: { condition: service_healthy }

See the docker-compose/full-stack example for app + emulator + Prometheus + Grafana.

What works today

bqemulator is at v1.0.2 — second patch on the production-stable line. SemVer applies: breaking changes ship only in MAJOR, deprecations live ≥2 MINOR or 6 months. The compatibility matrix is auto-generated from the conformance corpus on every CI run; the conformance coverage matrix breaks down support by surface item.

Surface Status
BigQuery REST: Datasets / Tables / Jobs / TableData / Routines / Row Access Policies / Authorized Views
Multipart + resumable upload (/upload/bigquery/v2/...)
INFORMATION_SCHEMA (TABLES, COLUMNS, ROUTINES, VIEWS, JOBS, JOBS_BY_*, MATERIALIZED_VIEWS, PARTITIONS, TABLE_OPTIONS, …)
Storage Read API (Arrow + Avro)
Storage Write API (all 4 stream types, proto + Arrow row formats)
GoogleSQL function surface (date / time / string / array / struct / JSON / regex / aggregate / approx / civil-time / bit)
Procedural scripting (DECLARE, BEGIN…END, IF, LOOP, EXCEPTION, BEGIN TRANSACTION)
SQL / JavaScript / Table-valued UDFs
Time travel (FOR SYSTEM_TIME AS OF), snapshots, clones, materialized views
Authorized views + row access policies + caller identity
GEOGRAPHY / RANGE / INTERVAL / NUMERIC / BIGNUMERIC types
Load formats: CSV / JSON / Avro / ORC / Parquet
Extract formats: CSV / JSON / Avro / Parquet
BigQuery ML (CREATE MODEL, ML.PREDICT, …) ❌ Out of scope — see docs/reference/out-of-scope.md
BI Engine / slot reservations / Data Transfer Service / scheduled queries ❌ Out of scope

Conformance corpus depth (snapshot at v1.0 prep, 2026-05-21):

Status Surface items % of deterministic surface
🟢🟢 Deep (≥6 fixtures) 96 23.8%
🟢 Covered (3–5 fixtures) 64 15.9%
🟡 Sampled (1–2 fixtures) 236 58.6%
🔴 Uncovered (0 fixtures) 7 1.7%
Total 403 100%

Plus 9 non-deterministic items (RAND, CURRENT_*, GENERATE_UUID, TABLESAMPLE, FOR SYSTEM_TIME AS OF <expression>) that are excluded from the conformance corpus by ADR 0022 and exercised in unit / property / integration tiers instead — bringing the full inventory to 412 surface items across 20 categories, backed by 1,141 recorded fixtures under tests/conformance/sql_corpus/.

We follow a no-deferral principle: features either ship complete or are excluded with documented rationale. There is no "TODO for v1.1." Scope boundaries are catalogued in docs/reference/out-of-scope.md.

Documentation

The full documentation lives at jjviscomi.github.io/bqemulator. Key entry points:

  • Getting started — your first ten minutes.
  • Per-language quickstarts — Python · Node.js · Go · Java · pytest · docker-compose · Testcontainers.
  • Guides — loading data, querying, streaming inserts, Storage API, UDFs, scripting, partitioning, time travel, materialized views, row access policies, dbt, Airflow, Spark, the bq CLI, observability, and more.
  • Reference — configuration, CLI, REST coverage, SQL function mapping, compatibility matrix, conformance coverage matrix, out-of-scope catalogue, troubleshooting.
  • Architecture — hexagonal architecture, storage model, SQL translation, jobs lifecycle, Storage Read/Write API design, scripting, UDFs, versioning, row access, specialized types, observability, testing strategy, conformance tier.
  • ADRs — 34 Architecture Decision Records documenting every non-obvious design choice.

Examples

Every example under docs/examples/ is a complete, runnable project with its own make test validated by CI:

Toolchain Example What it demonstrates
Python python/pytest-integration Flask app + auto-discovered bqemu_client fixture
Python python/dbt-local dbt build cycle via endpoint override
Python python/airflow-dag-test BigQueryInsertJobOperator DAG via offline dag.test()
Python python/pyspark-bigquery Storage Read → Arrow → Spark DataFrame
Node.js nodejs/nestjs-app NestJS + Jest + supertest e2e
Node.js nodejs/cloud-run-local Cloud Run-shaped Express + docker-compose
Go go/beam-pipeline Apache Beam Go SDK + Testcontainers
Go go/dataflow-local Stand-alone Go ETL binary
Java java/spring-boot Spring Boot + Testcontainers
Scala java/scio Spotify Scio (Scala-on-Beam) pipeline
Compose docker-compose/full-stack App + emulator + Prometheus + Grafana
CI ci-recipes/github-actions Service-container + Testcontainers patterns
CI ci-recipes/gitlab-ci services: alias on the CI network
CI ci-recipes/circleci Docker-secondary + machine executor

Project status

bqemulator is at v1.0.2 — second patch on the production-stable line. SemVer applies: breaking changes ship only in MAJOR versions, preceded by ≥1 MINOR with deprecation warnings; deprecated APIs remain for ≥2 MINOR versions or 6 months.

Maturity signals:

  • ✅ 34 Architecture Decision Records covering every non-obvious design choice (docs/adr/00010034).
  • ✅ ≥90% line + branch coverage gated by CI (make verify).
  • ✅ 7 test tiers passing (unit + property + integration + conformance + e2e + perf + chaos).
  • ✅ 5-client e2e matrix (Python · Node.js · Go · Java · bq CLI).
  • ✅ Mutation-tier (mutmut) pilot landed on pure-domain modules.
  • ✅ Fuzz-tier (Atheris) harnesses on the SQL translator, dynamic-protobuf decoder, and Arrow bridge.
  • ✅ Differential-tier row-order perturbation of the entire conformance corpus passes.
  • ✅ Performance baselines committed for darwin-arm64, with regression gates (pytest-benchmark --benchmark-compare-fail=median:10%).
  • ✅ PyPI publish via Trusted Publishing (sigstore-attested wheels) — pip install bqemulator==1.0.2 resolves from PyPI.
  • ✅ GHCR publish with keyless cosign signatures — docker pull ghcr.io/jjviscomi/bqemulator:1.0.2 resolves and the image is cosign-verifiable.

See CHANGELOG.md for the complete v1.0 inventory.

Contributing

We welcome contributions of all sizes. Start with CONTRIBUTING.md for the mechanics; AGENTS.md captures the project's day-to-day conventions; and docs/architecture/overview.md is the canonical architectural reference.

Pull requests are squash-merged into main with a Conventional Commits subject; commits carry a DCO sign-off (git commit -s). The full review policy lives in GOVERNANCE.md.

Community

  • 💬 GitHub Discussions — design questions, usage questions, and general help.
  • 🐛 Issues — bug reports and feature requests. Please search existing issues first.
  • 🔒 Security advisories — report vulnerabilities privately via the GitHub Security Advisory flow (see SECURITY.md for our disclosure policy).
  • 📜 Code of Conduct — adapted from the Contributor Covenant 2.1.

License

bqemulator is released under the Apache License 2.0.

Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bqemulator-1.0.2.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bqemulator-1.0.2-py3-none-any.whl (494.7 kB view details)

Uploaded Python 3

File details

Details for the file bqemulator-1.0.2.tar.gz.

File metadata

  • Download URL: bqemulator-1.0.2.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bqemulator-1.0.2.tar.gz
Algorithm Hash digest
SHA256 dda44b3b44bbbc5e67d9e313abb4edb38ae2613243bac9230209b81ddb33c20d
MD5 842aaffea07ef97367110ab4d1be16fd
BLAKE2b-256 3992b1b9c09e2caa1ba8b8e214dae202ea8fafac46eb79e3732f20063c28b230

See more details on using hashes here.

Provenance

The following attestation bundles were made for bqemulator-1.0.2.tar.gz:

Publisher: release.yml on jjviscomi/bqemulator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bqemulator-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: bqemulator-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 494.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bqemulator-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f36943ba14379c0dbc13f558f55b6cab7199fde3f8f0f7b9dccf0a173f75b650
MD5 9bd1f98e30eb8a82dd4a65f474daab9e
BLAKE2b-256 8b542334ad29dc7b37d626c96947ba158c9a67d9c10b7571199d8a34e1d61bfc

See more details on using hashes here.

Provenance

The following attestation bundles were made for bqemulator-1.0.2-py3-none-any.whl:

Publisher: release.yml on jjviscomi/bqemulator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page