Library to integrate the MoJ data platform with the catalogue component.
Project description
Data platform catalogue
This library is part of the Ministry of Justice data platform.
It publishes object metadata to a data catalogue, so that the metadata can be made discoverable by consumers.
Broadly speaking, a catalogue stores a metadata graph, consisting of data assets. Data assets could be tables, schemas or databases.
How to install
To install the package using pip
, run:
pip install ministryofjustice-data-platform-catalogue
Terminology
- Data assets - Any databases, tables, or schemas within the metadata graph
- Domains - allow metadata to be grouped into different service areas that have their own governance, like HMCTS, HMPPS, OPG, etc.
Example usage
from data_platform_catalogue import (
DataHubCatalogueClient,
BaseCatalogueClient, DataLocation, CatalogueMetadata,
DataProductMetadata, TableMetadata,
CatalogueError
)
client: BaseCatalogueClient = DataHubCatalogueClient(jwt_token=jwt_token, api_url=api_url)
data_product = DataProductMetadata(
name = "my_data_product",
description = "bla bla",
version = "v1.0.0",
owner = "7804c127-d677-4900-82f9-83517e51bb94",
email = "justice@justice.gov.uk",
retention_period_in_days = 365,
domain = "LAA",
subdomain = "Legal Aid",
dpia_required = False
)
table = TableMetadata(
name = "my_table",
description = "bla bla",
column_details=[
{"name": "foo", "type": "string", "description": "a"},
{"name": "bar", "type": "int", "description": "b"},
],
retention_period_in_days = 365,
major_version = 1
)
try:
table_fqn = client.upsert_table(
metadata=table,
data_product_metadata=data_product,
location=DataLocation("test_data_product_v1"),
)
except CatalogueError:
print("oh no")
Search example
response = client.search()
# Total results across all pages
print(response.total_results)
# Iterate over search results
for item in response.page_results:
print(item)
# Iterate over facet options
for option in response.facets.options('domains'):
print(option.label)
print(option.value)
print(option.count)
# Include a filter and sort
client.search(
filters=[MultiSelectFilter("domains", [response.facets['domains'][0].value])],
sort=SortOption(field="name", ascending=False)
)
Search filters
Datahub
Basic filters:
- urn
- customProperties
- browsePaths / browsePathsV2
- deprecated (boolean)
- removed (boolean)
- typeNames
- name, qualifiedName
- description, hasDescription
Timestamps:
- lastOperationTime (datetime)
- createdAt (timestamp)
- lastModifiedAt (timestamp)
URNs:
- platform / platformInstance
- tags, hasTags
- glossaryTerms, hasGlossaryTerms
- domains, hasDomain
- siblings
- owners, hasOwners
- roles, hasRoles
- container
Catalogue Implementations
DataHub
- Each table is created as a dataset in DataHub
- Tables that reside in the same athena database (data_product_v1) should be placed within the same DataHub container.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for ministryofjustice_data_platform_catalogue-1.0.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4c56f4fbe43dd46fed928f81ab9c242ed028430398c39d3e92a3d14548dfd75c |
|
MD5 | a92dd2c5a5e84c0c95e066d2e46a5a7d |
|
BLAKE2b-256 | 3ab11fd54d68c6fc526188e473813d8e1008263a0ea71ff76d7d46840cfabaa8 |
Close
Hashes for ministryofjustice_data_platform_catalogue-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e99fff0959b7739cac9a916705cb157aec740968a8ea74fd4eeffd693fcfee36 |
|
MD5 | 56db3f69c799e798dedfc42fb14cf4fb |
|
BLAKE2b-256 | 0de8994ecb84bc9ea91a59bb66008a1b5d3f1a01f31bafdfa0c9ddd671013e75 |