Skip to main content

DataJunction client library for connecting to a DataJunction server

Project description

DataJunction Python Client

Installation

To install:

pip install datajunction

Examples

To initialize the client:

from datajunction import DJClient

dj = DJClient("http://dj-endpoint:8000")

Catalogs and Engines

To list available catalogs for the DJ host:

dj.catalogs()

To list available engines for the DJ host:

dj.engines()

To create a catalog:

from datajunction import Catalog

catalog = Catalog(
    name="prod"
)
catalog.publish()

To create an engine:

from datajunction import Engine

engine = Engine(
    name="spark",
    version="3.2.2",
    uri="..."
)
engine.publish()

To attach an engine to a catalog:

catalog.add_engine(engine)

Nodes

All nodes for a given namespace can be found with:

dj.namespace("default").nodes()

Specific node types can be retrieved with:

dj.namespace("default").sources()
dj.namespace("default").dimensions()
dj.namespace("default").metrics()
dj.namespace("default").transforms()
dj.namespace("default").cubes()

To create a source node:

from datajunction import NodeMode

repair_orders = dj.new_source(
    name="repair_orders",
    display_name="Repair Orders",
    description="Repair orders",
    catalog="dj",
    schema_="roads",
    table="repair_orders",
)
repair_orders.save(mode=NodeMode.PUBLISHED)

Nodes can also be created as drafts with:

repair_orders.save(mode=NodeMode.DRAFT)

To create a dimension node:

repair_order = dj.new_dimension(
    name="repair_order",
    query="""
    SELECT
      repair_order_id,
      municipality_id,
      hard_hat_id,
      dispatcher_id
    FROM repair_orders
    """,
    description="Repair order dimension",
    primary_key=["repair_order_id"],
)
repair_order.save()

To create a transform node:

large_revenue_payments_only = dj.new_transform(
    name="large_revenue_payments_only",
    query="""
    SELECT
      payment_id,
      payment_amount,
      customer_id,
      account_type
    FROM revenue
    WHERE payment_amount > 1000000
    """,
    description="Only large revenue payments",
)
large_revenue_payments_only.save()

To create a metric:

num_repair_orders = dj.new_metric(
    name="num_repair_orders",
    query="""
    SELECT
      count(repair_order_id) AS num_repair_orders
    FROM repair_orders
    """,
    description="Number of repair orders",
)
num_repair_orders.save()

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datajunction-0.0.1a6.tar.gz (7.4 kB view hashes)

Uploaded Source

Built Distribution

datajunction-0.0.1a6-py3-none-any.whl (7.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page