Skip to main content

Data Access Platform client library

Reason this release was yanked:

Version is deprecated.

Project description

Data Access Platform Client Library

Data Access Platform (DAP) acts as a single source of data for analytics at Instructure. It provides efficient access to data collected across various educational products in bulk with high fidelity and low latency, adhering to a canonical data model.

The outgoing interface for DAP is the Query API, which is an HTTP REST service. Users initiate asynchronous queries to retrieve data associated with their account. This client library is a Python wrapper around the DAP API.

Each DAP user acts as a data administrator for the organization they represent. They have full read access to the top-level account and all descendant sub-accounts. For example, in Canvas, the top of the organization hierarchy is uniquely identified by a root account ID, and each data record is associated with a root account ID. A DAP user with Canvas access can query data that are assigned the user's root account ID.

DAP API requires authentication. The client library takes care of authentication behind the scenes provided you have the appropriate API key, and passes the token to each API operation it invokes. Refer to the documentation of Instructure API Gateway Service to learn more about the authentication process.

Under the hood, API users must first acquire a JSON Web Token (JWT) obtained from the authentication endpoint of Instructure API Gateway Service in order to invoke DAP API endpoints, and pass the JWT to all subsequent calls to DAP API.

Major features

  • List the name of tables available for querying
  • Download the JSON schema of a selected table
  • Fetch a full table snapshot
  • Fetch incremental updates since a specific point in time
  • Save data in several output formats: CSV, TSV, JSON, Parquet
  • Download output to a local directory

Getting started

Accessing DAP API requires a URL to an endpoint, and an API key. Once obtained, they can be set as environment variables (recommended), or passed as command-line arguments:

Use environment variables for authentication

First, configure the environment with what you have in your setup instructions:

export DAP_API_URL=https://api-gateway.instructure.com
export DAP_API_KEY=aCBd3V...U1aaaa

With environment variables set, you can issue dap commands directly:

dap incremental --namespace canvas --table accounts --since 2022-07-13T09:30:00+02:00

Use command-line for authentication

Unless you set environment variables, you need to pass endpoint URL and API key to the dap command explicitly:

dap --base-url https://api-gateway.instructure.com --api-key aCBd3V...U1aaaa incremental --namespace canvas --table accounts --since 2022-07-13T09:30:00+02:00

Command-line usage

Invoking the command-line utility with --help shows usage, required and optional arguments:

dap --help
dap incremental --help
dap snapshot --help
dap list --help
dap schema --help

Common use cases

Chain a snapshot query with an incremental query

When you start using DAP, you will definitely want to download a snapshot for the table(s) you need. In the snapshot query response body, you will find a field called at, which captures the data lake state at a point in time that the snapshot corresponds to. Copy the timestamp into the since field of an incremental query request. This will guarantee that you have chained the two queries and will not miss any data.

Note that if a table has not received updates for a while (e.g. user profiles have not changed over the weekend), the value of at might be well behind current time.

Chain an incremental query with another

To fetch the most recent changes since a previous incremental query, chain the next request to the previous response using since and until. The until of a previous response becomes the since of the next request. The until of the next request should typically be omitted, it is automatically populated by DAP API. This allows you to fetch the most recent changes for a table. If a table has not received updates for a while, timestamps you see in the response may lag behind current time.

For example, suppose you submit an incremental query job #82, and receive a response whose until is 2021-07-28T19:00. You can then pass 2021-07-28T19:00 as the value for since in your next incremental query job #83. Job #83 would then return 2021-07-28T19:00 as the value of since (the exact value you submitted), and might return 2021-07-28T21:00 as until (the latest point in time for which data is available).

If you choose to fill in until in a request (which is not necessary in most cases), its value must be in the time range DAP has data for. Otherwise, your request is rejected.

Get the list of tables available for querying

The list command will return all table names from a certain namespace.

Download the latest schema for a table

The schema endpoint returns the latest schema of a table as a JSON Schema document. The schema command enables you to download the schema of a specified table as a JSON file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

instructure-dap-client-0.2.3.tar.gz (19.2 kB view hashes)

Uploaded Source

Built Distribution

instructure_dap_client-0.2.3-py3-none-any.whl (19.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page