Integration services for ATD's knack applications.
Project description
atd-knack-services
Integration services for ATD's Knack applications.
Design
ATD Knack Services is comprised of a Python library (/services
) and scripts (/scripts
) which automate the flow of data from ATD's Knack applications to downstream systems.
These utilities are designed to:
- incrementally offload Knack application records and metadata as a JSON documents in a collection of S3 data stores
- incrementally fetch records and publish them to external systems such as Socrata and ArcGIS Online
- lay the groundwork for further integration with a data lake and/or a data warehouse
- be deployed in Airflow or similar task management frameworks
Configuration
S3 Data Store
Data is stored in an S3 bucket (s3://atd-knack-services
), with one subdirectory per Knack application per environment. Each app subdirectory contains a subdirectory for each container, which holds invdividual records stored as JSON a file with its id
serving as the filename. As such, each store follows the naming pattern s3://atd-knack-servies/<app-name>-<environment>/<container ID>
.
Application metadata is also stored as a JSON file at the root of each S3 bucket.
. s3://atd-knack-services
|- data-tracker-prod
| |-- 2x22pl1f7a63815efqx33p90.json # app metadata
| |-- view_1
| |-- 5f31673f7a63820015ef4c85.json
| |-- 5b34fbc85295dx37f1402543.json
| |-- 5b34fbc85295de37y1402337.json
| |...
Scripts (/scripts
)
Get the most recent successful DAG run
most_recent_dag_run.py
is meant to be run as an initial Airflow task which fetches the most recent successful run of itself. The date can then be passed to subsequent tasks as a filter parameter to support incremental record processing.
$ python most_recent_dag_run.py --dag atd_signals_socrata
CLI arguments
--dag
(str
, required): the DAG ID of DAG run to be fetched.
Load App Metadata to S3
Use upload_metadata.py
to load an application's metadata to S3.
$ python upload_metaddata.py \
--app-name data-tracker \
--env prod \
CLI arguments
--app-name
(str
, required): the name of the source Knack application--env
(str
, required): The application environment. Must beprod
ordev
.
Load Knack Records to S3
Use knack_container_to_s3.py
to incrementally load data from a Knack container (an object or view) to an S3 bucket.
$ python knack_container_to_s3.py \
--app-name data-tracker \
--container view_197 \
--env prod \
--date 1598387119 \
Publish Records to the Open Data Portal
Use upsert_knack_container_to_socrata.py
to publish a Knack container to the Open Data Portal (aka, Socrata).
$ python upsert_knack_container_to_socrata.py \
--app-name data-tracker \
--container view_197 \
--env prod \
--date 1598387119 \
CLI arguments
--app-name
(str
, required): the name of the source Knack application--container
(str
, required): the name of the object or view key of the source container--env
(str
, required): The application environment. Must beprod
ordev
.--date
(int
, required): a POSIX timestamp. only records which were modified at or after this date will be processed.
Services (/services
)
The services package contains utilities for fetching and pushing data between Knack applications and AWS S3.
It is designed as a free-standing Python package can be installed with pip
:
$ pip install atd-knack-services
and imported as services
:
import services
services.s3.upload
Multi-threaded uploading of file-like objects to S3.
services.s3.download
Multi-threaded downloading of file objects from S3.
How To
- Create bucket(s)
- Add Knack app credentials to auth configuration file
- Add container configuration file to /services/config
- Create DAGs
An end-to-end ETL process will involve creating at least three Airflow tasks:
- Load app metadata to S3
- Load Knack records to S3
- Publish Knack records to destination system
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file atd-knack-services-0.0.3.tar.gz
.
File metadata
- Download URL: atd-knack-services-0.0.3.tar.gz
- Upload date:
- Size: 3.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bff7d7d03aefd071f8f3d894865c2a0dd32e1f8d085e83ac49c92fef7d3a1cb1 |
|
MD5 | 79480cd399a5b83b2e3d3b845646e1b0 |
|
BLAKE2b-256 | e2e52cf836ada6d8ebd281e1fdcb5b1bc1fad892d3d3851c7d6a42c557602f92 |
File details
Details for the file atd_knack_services-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: atd_knack_services-0.0.3-py3-none-any.whl
- Upload date:
- Size: 3.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4559f44221c974068a443b33ee4eee5514e7a29ec5bde619c6310cda2d8af22c |
|
MD5 | a42f3e050ad247122d7d3153f172c81d |
|
BLAKE2b-256 | c0ab4b7cf9e8ecf07250448703c873c2413a7071d5dc09eb55ddc2e66dd22251 |