Local emulator for Google BigQuery — REST + gRPC (Storage Read/Write) APIs, DuckDB-backed, SQLGlot-powered.
Project description
bqemulator
A local, drop-in emulator for Google BigQuery.
DuckDB-backed, SQLGlot-powered, and tested against the real service. Point the official Google Cloud client libraries at it and run your BigQuery code on your laptop or in CI — no real project, no billing, no network.
Documentation · Quickstart · Examples · Compatibility matrix · Changelog
Why bqemulator?
Testing code against real BigQuery is slow (network + service latency), expensive (every query is billable), and dangerous (no rollback in shared environments). The alternatives — mocks, fakes, and shared sandboxes — drift from the real service the moment you stop chasing them.
bqemulator is a process you can run locally that speaks BigQuery's actual wire protocol (REST + gRPC), backs onto a real analytical SQL engine (DuckDB), and translates GoogleSQL → DuckDB SQL with a rule-based, ADR-grounded translator (SQLGlot). The official google-cloud-bigquery, @google-cloud/bigquery, cloud.google.com/go/bigquery, com.google.cloud:google-cloud-bigquery, and bq CLI clients all work against it unchanged — only the endpoint differs.
Three use cases, one binary:
- Ephemeral CI fixture —
pytestplugin starts an in-process emulator on a random port;pip install bqemulator[testing]is all the wiring you need. - Long-running local dev server —
bqemulator start --data-dir ~/bqemupersists state across runs; works with the officialbqCLI, dbt, Airflow, PySpark, Beam, Scio. - Offline replica of a real project —
bqemulator import --from-project <id>clones schema (and optionally data) from real BigQuery into a local data directory.
Highlights
- 🟢 Full REST + gRPC API parity — Datasets, Tables, Jobs, TableData, Routines, Row Access Policies, Authorized Views, plus Models CRUD metadata. Storage Read API (Arrow and Avro). Storage Write API (all four stream types —
DEFAULT,COMMITTED,PENDING,BUFFERED— with both proto and Arrow row formats). - ⚡ Real SQL — GoogleSQL translated to DuckDB SQL via 92 SQLGlot rules + 22 rewriters; covers date/time, string, array, struct, range, geography, JSON, approximate-aggregate, statistical, regex, civil-time, and bit operations.
- 🧠 Features
goccy/bigquery-emulatordoesn't have — JavaScript UDFs (embedded V8 viamini-racer), procedural scripting (DECLARE/BEGIN…END/IF/LOOP/EXCEPTION/BEGIN TRANSACTION), time travel (FOR SYSTEM_TIME AS OF), table snapshots, table clones, materialized views with refresh dispatch, GEOGRAPHY (planar via DuckDB-spatial + S2 helpers), RANGE, INTERVAL, authorized views, row-access policies,INFORMATION_SCHEMA. - 🔌 Five-client e2e matrix — every release is exercised against the official Python, Node.js, Go, and Java BigQuery client libraries plus Google's
bqCLI in a live Docker container. - 🧪 7-tier test pyramid — unit + property + integration + conformance + e2e + perf + chaos, plus mutation / fuzz / differential siblings. Combined coverage is gated at ≥90% line + branch.
- 📐 Conformance corpus — 1,141 fixtures recorded against real BigQuery. Drift between the emulator and the real service surfaces as a failing test; documented divergences are pinned with ADR references.
- 🐍 Native pytest plugin —
pip install bqemulatorregisters a pytest plugin; thebqemu_serverfixture starts an ephemeral in-process emulator on random free ports and setsBIGQUERY_EMULATOR_HOST. Noconftest.pywiring required. - 🐳 Multi-arch container —
ghcr.io/jjviscomi/bqemulatorbuilds forlinux/amd64+linux/arm64, with cosign keyless signatures via GitHub OIDC. - 🔭 Production-grade observability —
structlogJSON logs, OpenTelemetry tracing (configurable OTLP exporter), Prometheus metrics endpoint.
Install
pip install bqemulator
Optional extras:
pip install "bqemulator[testing]" # pytest, hypothesis, testcontainers, bigquery client
pip install "bqemulator[udf-js]" # JavaScript UDF support (embedded V8)
pip install "bqemulator[orc]" # ORC format for load jobs
pip install "bqemulator[compression]" # zstd + snappy for load/extract jobs
pip install "bqemulator[import]" # bqemulator import --from-project
pip install "bqemulator[all]" # all runtime extras (no testing extras)
Docker:
docker run --rm -p 9050:9050 -p 9060:9060 ghcr.io/jjviscomi/bqemulator:latest
Both pip and the published image bundle the same emulator. The image exposes REST on 9050 and gRPC on 9060 by default — see configuration reference to change them.
Windows users: install Docker Desktop for Windows with the WSL2 backend (default since Docker Desktop 4.x); the published Linux image runs natively under WSL2 with no Windows-specific configuration. Native Windows-container variants of the image are explicitly out of scope for v1.0 — see docs/reference/out-of-scope.md#native-windows-containers for the rationale.
Quickstart
Python
import os
from google.cloud import bigquery
# Either set BIGQUERY_EMULATOR_HOST (picked up by every Google Cloud library)
# or pass api_endpoint explicitly to the Client. Both work.
os.environ["BIGQUERY_EMULATOR_HOST"] = "localhost:9050"
client = bigquery.Client(project="my-test-project")
client.create_dataset("sales")
client.create_table(
bigquery.Table(
"sales.orders",
schema=[
bigquery.SchemaField("id", "INT64"),
bigquery.SchemaField("amount", "NUMERIC"),
bigquery.SchemaField("placed_at", "TIMESTAMP"),
],
)
)
client.insert_rows_json(
"sales.orders",
[{"id": 1, "amount": "12.50", "placed_at": "2026-05-21T00:00:00Z"}],
)
for row in client.query("SELECT COUNT(*) AS n FROM sales.orders").result():
print(row.n) # 1
pytest
bqemulator ships a pytest plugin via the pytest11 entry point. Installing the package is all the wiring you need — your conftest.py stays empty.
from google.cloud import bigquery
def test_orders_table(bqemu_client: bigquery.Client) -> None:
bqemu_client.create_dataset("sales")
# ... your test ...
The bqemu_server fixture is session-scoped (one emulator per test session); the bqemu_client fixture is function-scoped and returns a pre-configured bigquery.Client. See the pytest fixture guide and the python/pytest-integration example for a complete Flask app with integration tests.
Node.js
const { BigQuery } = require('@google-cloud/bigquery');
const bq = new BigQuery({
projectId: 'my-test-project',
apiEndpoint: 'http://localhost:9050',
token: 'dummy', // emulator accepts any token
});
await bq.createDataset('sales');
See the Node.js quickstart and the nodejs/nestjs-app example.
Go
client, _ := bigquery.NewClient(
ctx, "my-test-project",
option.WithEndpoint("http://localhost:9050"),
option.WithoutAuthentication(),
)
See the Go quickstart and the go/beam-pipeline example.
Java
BigQuery bq = BigQueryOptions.newBuilder()
.setProjectId("my-test-project")
.setHost("http://localhost:9050")
.setCredentials(NoCredentials.getInstance())
.build()
.getService();
See the Java quickstart and the java/spring-boot example.
bq CLI
bq --api=http://localhost:9050 \
--project_id=my-test-project \
query --use_legacy_sql=false 'SELECT 1 AS n'
See the bq CLI guide and the bq-cli-quickstart example.
docker-compose
services:
bqemulator:
image: ghcr.io/jjviscomi/bqemulator:latest
ports: ["9050:9050", "9060:9060"]
healthcheck:
test: ["CMD", "curl", "-sf", "http://localhost:9050/healthz"]
interval: 2s
retries: 30
app:
build: .
environment:
BIGQUERY_EMULATOR_HOST: bqemulator:9050
depends_on:
bqemulator: { condition: service_healthy }
See the docker-compose/full-stack example for app + emulator + Prometheus + Grafana.
What works today
bqemulator is at v1.1.1 — first minor on the production-stable
line. SemVer applies: breaking changes ship only in MAJOR,
deprecations live ≥2 MINOR or 6 months. The compatibility matrix is auto-generated from the conformance corpus on every CI run; the conformance coverage matrix breaks down support by surface item.
| Surface | Status |
|---|---|
| BigQuery REST: Datasets / Tables / Jobs / TableData / Routines / Row Access Policies / Authorized Views | ✅ |
Multipart + resumable upload (/upload/bigquery/v2/...) |
✅ |
INFORMATION_SCHEMA (TABLES, COLUMNS, ROUTINES, VIEWS, JOBS, JOBS_BY_*, MATERIALIZED_VIEWS, PARTITIONS, TABLE_OPTIONS, …) |
✅ |
| Storage Read API (Arrow + Avro) | ✅ |
| Storage Write API (all 4 stream types, proto + Arrow row formats) | ✅ |
| GoogleSQL function surface (date / time / string / array / struct / JSON / regex / aggregate / approx / civil-time / bit) | ✅ |
Procedural scripting (DECLARE, BEGIN…END, IF, LOOP, EXCEPTION, BEGIN TRANSACTION) |
✅ |
| SQL / JavaScript / Table-valued UDFs | ✅ |
Time travel (FOR SYSTEM_TIME AS OF), snapshots, clones, materialized views |
✅ |
| Authorized views + row access policies + caller identity | ✅ |
| GEOGRAPHY / RANGE / INTERVAL / NUMERIC / BIGNUMERIC types | ✅ |
| Load formats: CSV / JSON / Avro / ORC / Parquet | ✅ |
| Extract formats: CSV / JSON / Avro / Parquet | ✅ |
BigQuery ML (CREATE MODEL, ML.PREDICT, …) |
❌ Out of scope — see docs/reference/out-of-scope.md |
| BI Engine / slot reservations / Data Transfer Service / scheduled queries | ❌ Out of scope |
Conformance corpus depth (snapshot at v1.0 prep, 2026-05-21):
| Status | Surface items | % of deterministic surface |
|---|---|---|
| 🟢🟢 Deep (≥6 fixtures) | 96 | 23.8% |
| 🟢 Covered (3–5 fixtures) | 64 | 15.9% |
| 🟡 Sampled (1–2 fixtures) | 236 | 58.6% |
| 🔴 Uncovered (0 fixtures) | 7 | 1.7% |
| Total | 403 | 100% |
Plus 9 non-deterministic items (RAND, CURRENT_*, GENERATE_UUID, TABLESAMPLE, FOR SYSTEM_TIME AS OF <expression>) that are excluded from the conformance corpus by ADR 0022 and exercised in unit / property / integration tiers instead — bringing the full inventory to 412 surface items across 20 categories, backed by 1,141 recorded fixtures under tests/conformance/sql_corpus/.
We follow a no-deferral principle: features either ship complete or are excluded with documented rationale. There is no "TODO for v1.1." Scope boundaries are catalogued in docs/reference/out-of-scope.md.
Documentation
The full documentation lives at jjviscomi.github.io/bqemulator. Key entry points:
- Getting started — your first ten minutes.
- Per-language quickstarts — Python · Node.js · Go · Java · pytest · docker-compose · Testcontainers.
- Guides — loading data, querying, streaming inserts, Storage API, UDFs, scripting, partitioning, time travel, materialized views, row access policies, dbt, Airflow, Spark, the
bqCLI, observability, and more. - Reference — configuration, CLI, REST coverage, SQL function mapping, compatibility matrix, conformance coverage matrix, out-of-scope catalogue, troubleshooting.
- Architecture — hexagonal architecture, storage model, SQL translation, jobs lifecycle, Storage Read/Write API design, scripting, UDFs, versioning, row access, specialized types, observability, testing strategy, conformance tier.
- ADRs — 34 Architecture Decision Records documenting every non-obvious design choice.
Examples
Every example under docs/examples/ is a complete, runnable project with its own make test validated by CI:
| Toolchain | Example | What it demonstrates |
|---|---|---|
| Python | python/pytest-integration |
Flask app + auto-discovered bqemu_client fixture |
| Python | python/dbt-local |
dbt build cycle via endpoint override |
| Python | python/airflow-dag-test |
BigQueryInsertJobOperator DAG via offline dag.test() |
| Python | python/pyspark-bigquery |
Storage Read → Arrow → Spark DataFrame |
| Node.js | nodejs/nestjs-app |
NestJS + Jest + supertest e2e |
| Node.js | nodejs/cloud-run-local |
Cloud Run-shaped Express + docker-compose |
| Go | go/beam-pipeline |
Apache Beam Go SDK + Testcontainers |
| Go | go/dataflow-local |
Stand-alone Go ETL binary |
| Java | java/spring-boot |
Spring Boot + Testcontainers |
| Scala | java/scio |
Spotify Scio (Scala-on-Beam) pipeline |
| Compose | docker-compose/full-stack |
App + emulator + Prometheus + Grafana |
| CI | ci-recipes/github-actions |
Service-container + Testcontainers patterns |
| CI | ci-recipes/gitlab-ci |
services: alias on the CI network |
| CI | ci-recipes/circleci |
Docker-secondary + machine executor |
Project status
bqemulator is at v1.1.1 — first minor on the production-stable
line. SemVer applies: breaking changes ship only in MAJOR
versions, preceded by ≥1 MINOR with deprecation warnings;
deprecated APIs remain for ≥2 MINOR versions or 6 months.
Maturity signals:
- ✅ 34 Architecture Decision Records covering every non-obvious design choice (
docs/adr/0001–0034). - ✅ ≥90% line + branch coverage gated by CI (
make verify). - ✅ 7 test tiers passing (unit + property + integration + conformance + e2e + perf + chaos).
- ✅ 5-client e2e matrix (Python · Node.js · Go · Java ·
bqCLI). - ✅ Mutation-tier (
mutmut) pilot landed on pure-domain modules. - ✅ Fuzz-tier (
Atheris) harnesses on the SQL translator, dynamic-protobuf decoder, and Arrow bridge. - ✅ Differential-tier row-order perturbation of the entire conformance corpus passes.
- ✅ Performance baselines committed for
darwin-arm64, with regression gates (pytest-benchmark--benchmark-compare-fail=median:10%). - ✅ PyPI publish via Trusted Publishing (sigstore-attested wheels) —
pip install bqemulator==1.1.1resolves from PyPI. - ✅ GHCR publish with keyless cosign signatures —
docker pull ghcr.io/jjviscomi/bqemulator:1.1.1resolves and the image is cosign-verifiable.
See CHANGELOG.md for the complete v1.0 inventory.
Contributing
We welcome contributions of all sizes. Start with CONTRIBUTING.md for the mechanics; AGENTS.md captures the project's day-to-day conventions; and docs/architecture/overview.md is the canonical architectural reference.
Pull requests are squash-merged into main with a Conventional Commits subject; commits carry a DCO sign-off (git commit -s). The full review policy lives in GOVERNANCE.md.
Community
- 💬 GitHub Discussions — design questions, usage questions, and general help.
- 🐛 Issues — bug reports and feature requests. Please search existing issues first.
- 🔒 Security advisories — report vulnerabilities privately via the GitHub Security Advisory flow (see SECURITY.md for our disclosure policy).
- 📜 Code of Conduct — adapted from the Contributor Covenant 2.1.
License
bqemulator is released under the Apache License 2.0.
Acknowledgements
goccy/bigquery-emulatorfor blazing the trail and providing a decade of issue reports that seeded our regression corpus.- DuckDB, SQLGlot, FastAPI, Pydantic, Hatchling, and the Google Cloud client library teams whose work makes this project tractable.
- The Apache Beam, dbt, Airflow, PySpark, Spotify Scio, NestJS, and Spring Boot communities whose work the example projects compose with.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bqemulator-1.1.1.tar.gz.
File metadata
- Download URL: bqemulator-1.1.1.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ad451dd16cecb4816947e59540e14215d4187d05ddd60745decc8b4b336eeac2
|
|
| MD5 |
aa1b68430609f68a654b2b4158de6dc5
|
|
| BLAKE2b-256 |
b3d877c2e7489b5bc8cbd13835fe31f047ba8211f1d66f8e4ec8fe3b86a07901
|
Provenance
The following attestation bundles were made for bqemulator-1.1.1.tar.gz:
Publisher:
release.yml on jjviscomi/bqemulator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bqemulator-1.1.1.tar.gz -
Subject digest:
ad451dd16cecb4816947e59540e14215d4187d05ddd60745decc8b4b336eeac2 - Sigstore transparency entry: 1644957602
- Sigstore integration time:
-
Permalink:
jjviscomi/bqemulator@c109b4ed3ddf99d43760a61a6a4dec34d3769cc2 -
Branch / Tag:
refs/tags/v1.1.1 - Owner: https://github.com/jjviscomi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c109b4ed3ddf99d43760a61a6a4dec34d3769cc2 -
Trigger Event:
push
-
Statement type:
File details
Details for the file bqemulator-1.1.1-py3-none-any.whl.
File metadata
- Download URL: bqemulator-1.1.1-py3-none-any.whl
- Upload date:
- Size: 506.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
663c9601f5d3a98d38ebc9aabf229b84b75d7a43d33e602a5492e2dd85fcb2b5
|
|
| MD5 |
74b2a14cc3f189b6a168815606ea540c
|
|
| BLAKE2b-256 |
c548f8ac8e65fb12ea16695a2533c6a82a3226a0e86ca4d2bc625e7d0f86c925
|
Provenance
The following attestation bundles were made for bqemulator-1.1.1-py3-none-any.whl:
Publisher:
release.yml on jjviscomi/bqemulator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bqemulator-1.1.1-py3-none-any.whl -
Subject digest:
663c9601f5d3a98d38ebc9aabf229b84b75d7a43d33e602a5492e2dd85fcb2b5 - Sigstore transparency entry: 1644957728
- Sigstore integration time:
-
Permalink:
jjviscomi/bqemulator@c109b4ed3ddf99d43760a61a6a4dec34d3769cc2 -
Branch / Tag:
refs/tags/v1.1.1 - Owner: https://github.com/jjviscomi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c109b4ed3ddf99d43760a61a6a4dec34d3769cc2 -
Trigger Event:
push
-
Statement type: