Gibran: governed analytics + metric layer over DuckDB. YAML-defined sources/metrics/policies; queries flow through an identity-aware governance layer that rewrites and audits every attempt.
Project description
Gibran
Governed analytics over DuckDB. YAML-defined sources, metrics, dimensions, roles, and policies; every query — raw SQL or a typed JSON DSL — flows through an identity-aware governance layer that rewrites and audits it.
Same product category as Cube, dbt MetricFlow, Malloy, LookML. NOT a new database engine — storage and execution belong to DuckDB. The wedge is governed analytics: declarative semantics + identity-aware enforcement
- structured audit, in a single pip-installable artifact with no server.
Install
pip install gibran
Requires Python 3.11+.
Quickstart
gibran init --sample
gibran sync
gibran check
gibran query --role analyst_west --attr region=west \
--dsl '{"source":"orders","metrics":["order_count","gross_revenue"]}'
The analyst_west role's policy auto-injects WHERE region = 'west';
the query returns only west-region rows. A different role sees a
different view of the same underlying data, with a different audit row
recorded for each attempt.
What you can do with it
| Capability | How |
|---|---|
| Declare metrics declaratively | metrics: block in gibran.yaml; 10 primitives (count / sum / avg / min / max / ratio / expression / percentile / rolling_window / period_over_period). |
| Compose metrics | ratio references two metrics; expression templates with {metric_id}; cycle detection at sync time via a dependency DAG. |
| Govern access by role | YAML-declared policies with row-filter ASTs (operator whitelist) and column allow/deny. |
| Hide PII | sensitivity: pii / restricted on columns; default-deny per-policy; explicit grants required. |
| Bound access in time | valid_until on policies for contractors / consultants / temporary credentials. Evaluation denies past the timestamp without re-sync. |
| Audit every query | gibran_query_log table; for each allow/deny/error, the rewritten SQL + structured deny reason are recorded. Literals adjacent to sensitive columns are redacted before persistence. |
| Detect data-quality issues | quality_rules (not_null / unique / range / custom_sql) + freshness_rules. A failing severity: block rule denies subsequent queries until the data is healthy. |
| Read from anywhere DuckDB can | Parquet, CSV, DuckDB table, or SQL view — a source-type dispatcher resolves the FROM clause. No manual CREATE VIEW needed for file-backed sources. |
| Introspect what's available | gibran describe <source>, gibran catalog, gibran explain --dsl '...' (parse + validate + compile without executing). |
| Export results | `gibran query --output csv |
What's proven (334 tests)
Every test runs in-process against an in-memory DuckDB; the whole suite completes in under a minute.
| Test file | Count | What it covers |
|---|---|---|
test_dsl.py |
54 | DSL Pydantic validation, semantic validation against AllowedSchema, compiler SQL emission shape for every primitive, end-to-end execution. |
test_sync.py |
34 | YAML loader/applier round-trip, idempotency, cross-entity validation, dependency-DAG cycle rejection, valid_until round-trip + resync stability. |
test_redaction.py |
31 | Pure-function SQL + JSON redactors (eq / in / between / like / nested and-or-not / public columns unaffected / unparseable input fail-open) and end-to-end audit-row inspection. |
test_governance.py |
29 | preview_schema + evaluate across every DenyReason, role-attribute substitution, observability-aware denial ordering, time-bound expiry. |
test_execution_sql.py |
25 | Parse → govern → rewrite → execute pipeline; unsupported-feature rejection (joins, subqueries, CTEs, SELECT *). |
test_ast_validation.py |
20 | Filter-AST operator whitelist; rejects like, regex, function calls, attribute refs in DSL context. |
test_ast_compile.py |
20 | Policy + intent AST → SQL emission; identity-attribute substitution; literal rendering. |
test_observability_runner.py |
20 | Quality + freshness rule evaluation; severity routing; staleness windows. |
test_jwt_resolver.py |
18 | RS256/HS256, expiry, audience, issuer, tampered-signature rejection. |
test_period_over_period.py |
17 | All three comparisons (delta/ratio/pct_change); validates against hand-computed expected output. |
test_cli_introspection.py |
16 | describe, catalog, explain output shapes. |
test_observability.py |
15 | Source-health cache reads; record_run semantics. |
test_migrations.py |
10 | All 7 migrations apply clean + idempotent; pinned schema invariants. |
test_source_dispatch.py |
9 | Parquet / CSV / DuckDB table / SQL view dispatcher; FROM-clause shape per type. |
test_ast_intent.py |
8 | Intent-AST trust boundary — rejects {"$attr":...} substitution in DSL context. |
test_init_sample.py |
5 | gibran init --sample round-trip with a synthetic project. |
test_imports.py |
3 | All modules importable; no circular imports. |
YAML syntax
Sources
sources:
- id: orders
display_name: Orders
type: parquet # parquet | csv | duckdb_table | sql_view
uri: data/orders.parquet
primary_grain: order_id
columns:
- name: order_id
type: VARCHAR
sensitivity: public # public | internal | pii | restricted | unclassified
- name: amount
type: DECIMAL(18,2)
sensitivity: public
- name: customer_email
type: VARCHAR
sensitivity: pii # opt-in: governance must explicitly grant access
dimensions:
- id: orders.region
column: region
display_name: Region
type: categorical # categorical | temporal | numeric_bin
- id: orders.order_date
column: order_date
display_name: Order Date
type: temporal
Metrics
metrics:
- id: order_count
source: orders
display_name: Order Count
type: count
- id: gross_revenue
source: orders
display_name: Gross Revenue
type: sum
expression: amount
filter: "status = 'paid'"
unit: USD
- id: avg_order_value
source: orders
display_name: Average Order Value
type: ratio
numerator: gross_revenue # composes existing metrics
denominator: order_count
- id: p95_amount
source: orders
display_name: P95 Order Amount
type: percentile
column: amount
p: 0.95
- id: revenue_7d_rolling
source: orders
display_name: 7-Day Rolling Revenue
type: rolling_window
aggregate: sum
column: amount
window: "7 days" # DuckDB INTERVAL
order_by_column: order_date
filter: "status = 'paid'"
- id: revenue_mom
source: orders
display_name: Revenue MoM
type: period_over_period # composes a base metric with LAG()
base_metric: gross_revenue
period_dim: orders.order_date
period_unit: month # year | quarter | month | week | day
comparison: delta # delta | ratio | pct_change
Roles + policies
roles:
- id: analyst_west
display_name: West Region Analyst
attributes:
region: west # surfaces as {"$attr":"region"} in row filters
policies:
- id: analyst_west_orders
role: analyst_west
source: orders
default_column_mode: allow # or deny; column_overrides flip per-column
valid_until: "2027-01-01T00:00:00" # optional; NULL = never expires
row_filter: # AST — operator whitelist enforced
op: eq
column: region
value: { $attr: region } # resolved from identity at query time
- id: external_partner_orders
role: external_partner
source: orders
default_column_mode: deny # nothing visible by default
column_overrides:
order_id: allow
amount: allow
order_date: allow # customer_email stays denied
row_filter:
op: eq
column: region
value: west # plain literal (no $attr ref)
Quality + freshness
quality_rules:
- id: orders_amount_not_null
source: orders
type: not_null
config: { column: amount }
severity: block # block | warn
- id: orders_amount_range
source: orders
type: range
config: { column: amount, min: 0, max: 1000000 }
severity: warn
- id: orders_status_in_allowlist
source: orders
type: custom_sql
config:
sql: "SELECT COUNT(*) FROM orders WHERE status NOT IN ('paid','pending','refunded')"
severity: warn
freshness_rules:
- id: orders_freshness_24h
source: orders
watermark_column: order_date
max_age_seconds: 86400
severity: block
DSL syntax — queries
{
"source": "orders",
"metrics": ["order_count", "gross_revenue"],
"dimensions": [{"id": "orders.order_date", "grain": "month"}],
"filters": [
{"op": "gte", "column": "amount", "value": 10},
{"op": "in", "column": "region", "value": ["west", "central"]}
],
"having": [{"op": "gt", "metric": "gross_revenue", "value": 100}],
"order_by": [{"key": "orders.order_date", "direction": "asc"}],
"limit": 100
}
The DSL is the user surface. There is no LLM in the emission path — by design. Classical NLP / pattern templates / embedding retrieval are in-scope for a future NL layer; constrained-LLM emission is permanently out of scope because it can hallucinate references not in the schema.
Filter AST operators
and / or / not / eq / neq / lt / lte / gt / gte / in / not_in / is_null / is_not_null / between
Notably absent: like, regex, function calls. A future "approved
functions" registry can extend, but the whitelist closes a class of
SQL-injection-via-policy-author bugs by construction.
Worked example: same query, two roles, two outcomes
# analyst_west: gets the filtered view
$ gibran query --role analyst_west --attr region=west \
--dsl '{"source":"orders","metrics":["order_count"]}'
# rewritten SQL: SELECT COUNT(*) FROM "orders" WHERE ("region" = 'west')
# row count: 1
# external_partner: column access denied
$ gibran query --role external_partner \
--sql "SELECT customer_email FROM orders"
# status=denied, deny_reason=policy:no_column_access:customer_email
# (exit code 2)
Both attempts write an audit row. The denied query's generated_sql
column has the literal customer_email value in the SQL replaced with
<redacted> — the audit log itself cannot become a side channel for
the data it protects.
Worked example: time-bound contractor access
policies:
- id: contractor_analytics
role: contractor_analyst
source: orders
default_column_mode: allow
valid_until: "2026-12-31T23:59:59"
# Before 2027-01-01: allowed
$ gibran query --role contractor_analyst --sql "SELECT COUNT(*) FROM orders"
# 12340
# After: denied automatically; no re-sync required
$ gibran query --role contractor_analyst --sql "SELECT COUNT(*) FROM orders"
# status=denied, deny_reason=policy:expired:valid_until=2026-12-31T23:59:59
The expiry comparison happens inside DuckDB (CURRENT_TIMESTAMP) at
each query, not in Python — eliminating a class of UTC-drift bugs.
Worked example: composed metrics with audit trail
metrics:
- id: order_count
type: count
- id: gross_revenue
type: sum
expression: amount
filter: "status = 'paid'"
- id: avg_order_value
type: ratio
numerator: gross_revenue
denominator: order_count
$ gibran query --role analyst_west --attr region=west \
--dsl '{
"source": "orders",
"metrics": ["avg_order_value"],
"dimensions": [{"id": "orders.order_date", "grain": "month"}]
}'
The compiler resolves avg_order_value to
(SUM(amount) FILTER (WHERE status='paid')) / NULLIF(COUNT(*), 0),
injects WHERE region='west', groups by month, and runs. The
gibran_query_log row records metric_versions=[("avg_order_value", 1), ("gross_revenue", 1), ("order_count", 1)] for reproducibility — you
know exactly which metric definitions answered this question.
CLI reference
| Command | What it does |
|---|---|
gibran init [--sample] |
Apply migrations; with --sample, drop a starter gibran.yaml + seed data. |
gibran sync |
Validate gibran.yaml and write to the catalog + governance tables. |
gibran check [--source <id>] |
Run quality + freshness rules; refresh source-health cache. |
gibran query --role <r> [--attr k=v]... "<sql>" | --dsl '{...}' |
Execute a governed query. --output tsv|csv|json|parquet [file]. Structured exit codes: 0=ok, 1=failed-rule, 2=denied, 3=error. |
gibran explain --role <r> --dsl '{...}' |
Compile without executing; print SQL + applied governance. |
gibran describe <source> --role <r> |
Show AllowedSchema (columns / dimensions / metrics / row filter) for an identity. |
gibran catalog --role <r> |
List sources the identity can see, with column/dim/metric counts. |
gibran register |
Generate a sample JWT for local dev. |
Project layout
src/gibran/
catalog/ # docstrings — schema is in migrations/
governance/ # identity, policies, ast, evaluate, redaction
observability/ # quality/freshness types + runner
dsl/ # QueryIntent, validate, compile, run
execution/ # parse → govern → rewrite → execute
sync/ # YAML schema, loader, applier, migration runner
cli/ # typer entrypoint
_sql.py # qident, render_literal
_source_dispatch.py # source_type -> FROM-clause snippet
migrations/ # 0001 catalog -> 0007 time_bound_policies
tests/ # 334 tests across 17 files
prompts/
architect_layer.md # refined architect prompt with fixed constraints
HANDOFF.md # forward-looking priority list
STATUS.md # current per-layer state
Run the suite
python -m pytest tests
Intentionally out of scope (V1)
- LLM in any emission path. Any approach where the system can
invent a metric or column name not in
AllowedSchemais out. Pattern templates and local-embedding retrieval are in-scope for a future NL layer; constrained-LLM emission is not. - Cross-source metrics. Composition is single-source in V1; the dependency DAG is structured so V2 can relax this without a migration.
- Multi-process / server mode. DuckDB is single-writer per file.
- Cohort + funnel + multi-stage CTE primitives. Tier 3 work; needs
the compiler to emit
WITH a AS (...), b AS (...) SELECT ...shape.
See HANDOFF.md for the full prioritized roadmap.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gibran-0.0.1.tar.gz.
File metadata
- Download URL: gibran-0.0.1.tar.gz
- Upload date:
- Size: 108.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c23fbc0a6adf23cc4c976dcfc2b9907c845997813223ba01d75d952ca3f8837e
|
|
| MD5 |
f85b5578470d23d376aa1c0785f59f70
|
|
| BLAKE2b-256 |
32fd09e067869a64809385137adcf5a38d8dd3ed26eeb3cf291dca2fe2c27c29
|
File details
Details for the file gibran-0.0.1-py3-none-any.whl.
File metadata
- Download URL: gibran-0.0.1-py3-none-any.whl
- Upload date:
- Size: 66.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
882e86fc2f32c6cb29bcdde6af34410a905a79870a4c5e9c6a5aa2b1f61a0321
|
|
| MD5 |
4b54fb2dafb23630cfc30e827e35628a
|
|
| BLAKE2b-256 |
cc3092a8ce302ce09fe4fffb9284caf6cb71c80db3820622f64754ede04369cf
|