Skip to main content

Integration services for ATD's knack applications.

Project description

atd-knack-services

Integration services for ATD's Knack applications.

Design

ATD Knack Services is comprised of a Python library (/services) and scripts (/scripts) which automate the flow of data from ATD's Knack applications to downstream systems.

These utilities are designed to:

  • incrementally offload Knack application records and metadata as a JSON documents in a collection of S3 data stores
  • incrementally fetch records and publish them to external systems such as Socrata and ArcGIS Online
  • lay the groundwork for further integration with a data lake and/or a data warehouse
  • be deployed in Airflow or similar task management frameworks

basic data flow

Configuration

S3 Data Store

Data is stored in an S3 bucket (s3://atd-knack-services), with one subdirectory per Knack application per environment. Each app subdirectory contains a subdirectory for each container, which holds invdividual records stored as JSON a file with its id serving as the filename. As such, each store follows the naming pattern s3://atd-knack-servies/<app-name>-<environment>/<container ID>.

Application metadata is also stored as a JSON file at the root of each S3 bucket.

. s3://atd-knack-services
|- data-tracker-prod
|   |-- 2x22pl1f7a63815efqx33p90.json   #  app metadata
|   |-- view_1
|       |-- 5f31673f7a63820015ef4c85.json
|       |-- 5b34fbc85295dx37f1402543.json
|       |-- 5b34fbc85295de37y1402337.json
|       |...

Scripts (/scripts)

Get the most recent successful DAG run

most_recent_dag_run.py is meant to be run as an initial Airflow task which fetches the most recent successful run of itself. The date can then be passed to subsequent tasks as a filter parameter to support incremental record processing.

$ python most_recent_dag_run.py --dag atd_signals_socrata  

CLI arguments

  • --dag (str, required): the DAG ID of DAG run to be fetched.

Load App Metadata to S3

Use upload_metadata.py to load an application's metadata to S3.

$ python upload_metaddata.py \
    --app-name data-tracker \
    --env prod \

CLI arguments

  • --app-name (str, required): the name of the source Knack application
  • --env (str, required): The application environment. Must be prod or dev.

Load Knack Records to S3

Use knack_container_to_s3.py to incrementally load data from a Knack container (an object or view) to an S3 bucket.

$ python knack_container_to_s3.py \
    --app-name data-tracker \
    --container view_197 \
    --env prod \
    --date 1598387119 \

Publish Records to the Open Data Portal

Use upsert_knack_container_to_socrata.py to publish a Knack container to the Open Data Portal (aka, Socrata).

$ python upsert_knack_container_to_socrata.py \
    --app-name data-tracker \
    --container view_197 \
    --env prod \
    --date 1598387119 \

CLI arguments

  • --app-name (str, required): the name of the source Knack application
  • --container (str, required): the name of the object or view key of the source container
  • --env (str, required): The application environment. Must be prod or dev.
  • --date (int, required): a POSIX timestamp. only records which were modified at or after this date will be processed.

Services (/services)

The services package contains utilities for fetching and pushing data between Knack applications and AWS S3.

It is designed as a free-standing Python package can be installed with pip:

$ pip install atd-knack-services

and imported as services:

import services

services.s3.upload

Multi-threaded uploading of file-like objects to S3.

services.s3.download

Multi-threaded downloading of file objects from S3.

How To

  • Create bucket(s)
  • Add Knack app credentials to auth configuration file
  • Add container configuration file to /services/config
  • Create DAGs

An end-to-end ETL process will involve creating at least three Airflow tasks:

  • Load app metadata to S3
  • Load Knack records to S3
  • Publish Knack records to destination system

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atd-knack-services-0.0.3.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

atd_knack_services-0.0.3-py3-none-any.whl (3.1 kB view details)

Uploaded Python 3

File details

Details for the file atd-knack-services-0.0.3.tar.gz.

File metadata

  • Download URL: atd-knack-services-0.0.3.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for atd-knack-services-0.0.3.tar.gz
Algorithm Hash digest
SHA256 bff7d7d03aefd071f8f3d894865c2a0dd32e1f8d085e83ac49c92fef7d3a1cb1
MD5 79480cd399a5b83b2e3d3b845646e1b0
BLAKE2b-256 e2e52cf836ada6d8ebd281e1fdcb5b1bc1fad892d3d3851c7d6a42c557602f92

See more details on using hashes here.

File details

Details for the file atd_knack_services-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: atd_knack_services-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 3.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for atd_knack_services-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 4559f44221c974068a443b33ee4eee5514e7a29ec5bde619c6310cda2d8af22c
MD5 a42f3e050ad247122d7d3153f172c81d
BLAKE2b-256 c0ab4b7cf9e8ecf07250448703c873c2413a7071d5dc09eb55ddc2e66dd22251

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page