19 projects
data-engineering-pulumi-components
Reusable components for use in Pulumi Python projects
splink
Fast probabilistic data linkage at scale
iam-builder
A lil python package to generate iam policies
pydbtools
A python package to query data via amazon athena and bring it into a pandas df using aws-wrangler.
arrow-pd-parser
MoJ arrow-pd-parser
mojap-metadata
A python package to manage metadata
form-tools
None
mojap-airflow-tools
A few wrappers and tools to use Airflow on the Analytical Platform
etl_manager
A python package to manage etl processes on AWS
data-linter
data linter
database-testing-tools
A package to test our databases
s3_data_packer
dataengineeringutils3
Data engineering utils Python 3 version
splink-graph
a small set of graph functions to be used from pySpark on top of networkx and graphframes
athena-tools
set of useful Athena db creation tools
splink-data-generation
Generate synthetic data with a specified data generating process
splink-data-standardisation
gluejobutils
Python 2.7 utils for glue jobs
pdf2embeddings
NLP tool for scraping text from a corpus of PDF files, embedding the sentences in the text and finding semantically similar sentences to a given search query.