datcat

Simple Data Catalogue API

These details have not been verified by PyPI

Project links

Project description

DatCat

Please note this is an alpha version and still in active development. Naturally all feedback is welcome.

Datcat is a simple and lightweight data catalogue api for big query. Datcat loads your .json schema files to memory for use with either your own synchronisation service or catasyn - it's sibling application. Look into the example_catalogue directory or here to find out how to define your bigquery schemas. Here's a quick snippet if you are as lazy as I am:

[
  {
    "description": "Unique Identifier",
    "mode": "REQUIRED",
    "name": "MY_UNIQUE_ID",
    "type": "INT64"
  },  {
    "description": "Favourite Colour",
    "mode": "REQUIRED",
    "name": "MY_FAVOURITE_COLOUR",
    "type": "STRING"
  }
]

Currently, datcat supports partition generation and pii identification via tagging the relevant column's description with {"partition": true} and/or {"pii": true}.

[
  {
    "description": "{\"pii\": true}",
    "mode": "REQUIRED",
    "name": "col_4",
    "type": "STRING"
  },
  {
    "description": "{\"partition\": true}",
    "mode": "REQUIRED",
    "name": "date",
    "type": "DATE"
  }
]

In addition to serving schema definitions via its api, it creates a basic mapping between a schema - topic - subscriber that is later used to create the relevant infrastructure [1] from the schema definition. After the schemas are defined run python -m datcat.service_layer.mappings to create those mappings. The naming conventions are basic, with each topic containing all versions of an event and each topic having only one subscriber for the purposes of data lake ingestion alone.

//schema_topic_subscription.json
{
  "login_v1": {
    "schema_class_name": "login",
    "topic_name": "login_topic",
    "subscription_name": "login_subscription"
  }
}

CI/CD is your gig but if you fancy seeing datcat in action in your local docker run ./docker-docker-build.sh and go to: http://0.0.0.0:50000

Footnote 1

IAM and general permissions are out of scope in this project. It's up to you to ensure your service account has all the necessary roles/permissions to create bigquery tables and topics/subscribers. Check this for a reminder.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.4

Mar 26, 2021

0.1.3

Feb 2, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datcat-0.1.4.tar.gz (7.0 kB view details)

Uploaded Mar 26, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

datcat-0.1.4-py3-none-any.whl (8.1 kB view details)

Uploaded Mar 26, 2021 Python 3

File details

Details for the file datcat-0.1.4.tar.gz.

File metadata

Download URL: datcat-0.1.4.tar.gz
Upload date: Mar 26, 2021
Size: 7.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.1.5 CPython/3.9.2 Darwin/20.3.0

File hashes

Hashes for datcat-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`dd04e5951ad2d5374b0f89fa7d501d4764014fe3115e84316a0c9dac14db48e0`
MD5	`c721da1b08cdeffa3705490c835d7926`
BLAKE2b-256	`d97bfd77d6759444971b7f92ddea2eb25cb6ffe0d2d6c7471d3728dbd01f01f9`

See more details on using hashes here.

File details

Details for the file datcat-0.1.4-py3-none-any.whl.

File metadata

Download URL: datcat-0.1.4-py3-none-any.whl
Upload date: Mar 26, 2021
Size: 8.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.1.5 CPython/3.9.2 Darwin/20.3.0

File hashes

Hashes for datcat-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`75537f966a1ec9b18b8a4cbac7ecd00b95db6fed37ce11f31856f4c0269fdca8`
MD5	`221eaee934c60b60878804da4631d5f8`
BLAKE2b-256	`09e434ae293095f51c384ace9fee40e26e7df0ac96bd36f2b5b07cfcb1f9c94b`

See more details on using hashes here.

datcat 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DatCat

Footnote 1

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes