Library for generating and bootstrapping mock data
Project description
Knockoff Factory
A library for generating mock data and creating database fixtures that can be used for unit testing.
Run poetry install/update with psycopg2
Requirements:
- postgresql (
brew install postrgresql
)
Run the following command:
pg_config | grep "LDFLAGS ="
Output:
LDFLAGS = -L/usr/local/opt/openssl@1.1/lib -L/usr/local/opt/readline/lib -Wl,-dead_strip_dylibs
Take the value of LDFLAGS
and set that environment variable. E.g.:
export LDFLAGS="-L/usr/local/opt/openssl@1.1/lib -L/usr/local/opt/readline/lib -Wl,-dead_strip_dylibs"
You should now be able to run poetry install and/or update commands without failing on psycopg2.
Local Postgres Setup
The following steps can be used to setup a local postgres instance for testing.
Requirements
- docker
- poetry (
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python
) - postgresql (
brew install postgresql
) or pgcli (brew install pgcli
)
Run Postgres
- Pull docker image
docker pull postgres:11.7
- Run docker container:
docker run --rm --name pg-docker -e POSTGRES_PASSWORD=docker -d -p 5432:5432 postgres:11.7
- Note: you can see the running container with
docker ps
and can terminate it withdocker kill pg-docker
- Note: you can see the running container with
You can now access the shell with PGPASSWORD=docker pgcli -h localhost -U postgres
or
PGPASSWORD=docker psql -h localhost -U postgres
.
Tests
Run unit tests
poetry run pytest
The unit tests depend on fixtures using ephemeral postgres databases
and/or instances. By default it will attempt to connect to an existing
instance at postgresql://postgres@localhost:5432/postgres
and will
create and destroy databases per fixture. This postgres location can
be overridden with the KNOCKOFF_TEST_DB_URI
environment variable.
If no external postgres instance is available for testing, but postgresql is
installed, the TEST_USE_EXTERNAL_DB
environment variable can be set to 0
.
The fixtures will then rely on the testing.postgresql
library to create
ephemeral postgres instances per fixture.
If postgres is not available, dependent tests can be disabled with the following environment variable
export TEST_POSTGRES_ENABLED=0
Knockoff Python Based Configuration
We will be providing documentation along with example repo's for the python based configuration using the knockoff.sdk modules.
Knockoff Legacy Yaml Based Configuration
Note: This yaml based configuration has been moved under the legacy subcommand for knockoff and a new yaml based configuration will be introduced that relies on the same objects in python based configuartion (knockoff.sdk).
Creating Databases
Knockoff will start by creating any specified databases. This section
is optional if you do not need a database created. You can also configure
an engine builder to use by providing the config
parameter for each
configured database (factory function used is knockoff.utilities.orm.sql.EngineBuilder.from_config)
otherwise the default engine based on knockoff environment variables will be used.
The following yaml will result in the following sql queries:
create database mydb;
create user myuser with encrypted password 'MYUSER_PASSWORD';
MYUSER_PASSWORD
is replaced with the value of the corresponding environment variable
grant all privileges on database mydb to myuser;
create-databases:
- name: mydb
type: postgres
users:
- user: myuser
password_env: MYUSER_PASSWORD
Load existing table definitions
Knockoff uses Pyrseas's yamltodb tool to load existing table definitions into a database.
Example:
In this example there is a table films
with the following definition:
+----------+-------------------+-------------+
| Column | Type | Modifiers |
|----------+-------------------+-------------|
| title | character varying | not null |
| director | character varying | |
| year | character varying | |
+----------+-------------------+-------------+
Indexes:
"films_pkey" PRIMARY KEY, btree (title)
Executing dbtoyaml mydb
results in the following yaml that knockoff
can be configured to use to load with yamltodb
.
schema public:
description: standard public schema
owner: postgres
privileges:
- PUBLIC:
- all
- postgres:
- all
table films:
columns:
- title:
not_null: true
type: character varying
- director:
type: character varying
- year:
type: character varying
owner: myuser
primary_key:
films_pkey:
columns:
- title
Note: If you are running the local postgres setup described above and running from within a docker container on your mac, you can use the following: PGPASSWORD=docker dbtoyaml -H docker.for.mac.host.internal -U postgres mydb
Loading data into tables
Data can be loaded into new or existing tables.
Examples
The following example loads data into an existing table from a provided csv.
knockoff:
dag:
- name: films # arbitrary name of node in dag
type: table # table | prototype | component | part
table: films # defaults to the name of the node if not provided
source:
strategy: io
reader: pandas.read_csv
kwargs:
filepath_or_buffer: example/films.csv # local or s3:// path
sep: "|"
sink:
database: mydb
kwargs:
if_exists: append # defaults to fail
index: false # Data is loaded into a pandas DataFrame this option ignores the index
The following example loads data into a new table from data defined in the yaml.
knockoff:
dag:
- name: films2 # Note: "table" key not specified, so defaults to "film2"
type: table
source:
strategy: io
reader: inline
kwargs:
sep: ","
data: |
title,director,year
t5,d1,2020
t6,d2,2020
t7,d1,2020
sink:
database: mydb
user: myuser
password_env: MYUSER_PASSWORD
kwargs:
index: false
Generating fake retail data
Knockoff uses faker to help generate fake retail data that can be used for testing. Hierarchical relationships with various dependencies can be also be modelled with knockoff. This example generates the following tables (in addition to the above examples).
postgres@localhost:mydb> select * from location;
+-------------------------------+---------------+-----------+
| address | location_id | channel |
|-------------------------------+---------------+-----------|
| 07528 Fischer Track Suite 779 | 1 | nfs |
| Melissaview, MD 90363 | | |
| 1535 Kelly Canyon | 2 | nso |
| Rhodesborough, CA 43893 | | |
| 216 Kayla Lake Apt. 126 | 3 | nso |
| South Matthewmouth, OH 36332 | | |
| 561 Jones Burg Suite 382 | 4 | nso |
| Hugheschester, DE 21908 | | |
| 042 Robinson Fort Suite 945 | 5 | nfs |
| Pattersonshire, NC 96317 | | |
| 2332 Watkins Road | 0 | digital |
| Davidfort, MS 71411 | | |
+-------------------------------+---------------+-----------+
postgres@localhost:mydb> select * from product;
+------------+----------+-----------------------+---------------+------------+
| division | gender | category | color | sku |
|------------+----------+-----------------------+---------------+------------|
| apparel | men | shorts | PaleGoldenRod | 6357812379 |
| apparel | women | pants & tights | NavajoWhite | 8332320303 |
| apparel | men | tops & t-shirts | Lavender | 9243289077 |
| shoes | men | lifestyle | PaleGoldenRod | 7270972977 |
| apparel | women | pants & tights | PaleGoldenRod | 4443641793 |
| apparel | women | hoodies & sweatshirts | NavajoWhite | 6130018459 |
| shoes | women | jordan | Lavender | 3791231041 |
| apparel | men | pants & tights | PaleGoldenRod | 3899370297 |
| apparel | men | shorts | Lavender | 7557742055 |
| apparel | men | pants & tights | SkyBlue | 9785957221 |
| apparel | women | shorts | SkyBlue | 9979359561 |
| apparel | women | tops & t-shirts | Lavender | 7006056836 |
| shoes | women | jordan | Lavender | 4853474331 |
| shoes | women | jordan | NavajoWhite | 6589395336 |
| apparel | men | pants & tights | Beige | 7168664719 |
| apparel | men | hoodies & sweatshirts | Beige | 7525844204 |
| apparel | men | shorts | SkyBlue | 9735336861 |
| shoes | men | skateboarding | SkyBlue | 6385212885 |
| apparel | men | tops & t-shirts | Beige | 9735107927 |
| apparel | women | pants & tights | SkyBlue | 2633853831 |
| apparel | women | jackets & vests | NavajoWhite | 2758275877 |
| apparel | men | shorts | Lavender | 1330756304 |
| apparel | women | tops & t-shirts | NavajoWhite | 9334676293 |
| shoes | men | skateboarding | Lavender | 6735393792 |
| apparel | men | jackets & vests | Lavender | 2907811814 |
+------------+----------+-----------------------+---------------+------------+
postgres@localhost:mydb> select * from transactions limit 10;
+---------------+------------+-----------+------------+------------+------------+
| location_id | sku | line_id | order_id | quantity | date |
|---------------+------------+-----------+------------+------------+------------|
| 5 | 7557742055 | 2 | 2957859949 | 0 | 2018-05-06 |
| 1 | 1330756304 | 1 | 3920316859 | 0 | 2018-07-19 |
| 4 | 9243289077 | 3 | 1875617688 | 0 | 2019-10-14 |
| 3 | 9334676293 | 2 | 7317451987 | 0 | 2018-06-09 |
| 0 | 9979359561 | 3 | 1236640244 | 2 | 2019-07-17 |
| 0 | 9735107927 | 1 | 9030486883 | 3 | 2018-04-27 |
| 1 | 9735336861 | 2 | 6196902209 | 2 | 2020-01-21 |
| 3 | 7006056836 | 4 | 1432537227 | 0 | 2019-04-22 |
| 2 | 2907811814 | 1 | 4039132536 | 0 | 2019-09-23 |
| 0 | 4443641793 | 5 | 8648324533 | 2 | 2018-09-06 |
+---------------+------------+-----------+------------+------------+------------+
Run the example:
- Run postgres:
docker run --rm --name pg-docker -e POSTGRES_PASSWORD=docker -d -p 5432:5432 postgres:11.9
- Terminate existing container with
docker kill pg-docker
- Note: The example assumes you're running with POSTGRES_PASSWORD=docker and on port 5432
- Terminate existing container with
- Checkout the repo or download the examples folder
- Pull knockoff docker image:
docker pull knockoff-factory
docker run --rm -v $PWD/examples:/examples \
-e KNOCKOFF_DB_HOST='docker.for.mac.host.internal' \
-e KNOCKOFF_DB_USER='postgres' \
-e KNOCKOFF_DB_PASSWORD='docker' \
-e KNOCKOFF_DB_NAME='knockoff' \
-e KNOCKOFF_CONFIG=/examples/knockoff.yaml knockoff-factory:latest knockoff legacy
Note: if you are loading data from an s3 bucket you have access to, you can enable your docker
container access to those credentials by adding -v ~/.aws:/root/.aws
to the docker run
command.
Future work
- Add documentation and examples for SDK
- Add yaml based configuration for SDK
- Make extensible generic output for KnockffDB.insert (csv, parquet, etc)
- Enable append option for KnockoffDB.insert
- Autodiscover and populate all tables by using reflection and building dependency graph with foreign key relationships
- Documentation / use-cases
- Parallelize execution of dag. (e.g. https://ipython.org/ipython-doc/stable/parallel/dag_dependencies.html)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file knockoff-4.0.0.tar.gz
.
File metadata
- Download URL: knockoff-4.0.0.tar.gz
- Upload date:
- Size: 38.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.6 CPython/3.8.11 Linux/5.8.0-1036-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b680cd32c4e1f351bf73f15e874f080d0eac9bcc551734631ee469c5f1b74b62 |
|
MD5 | 0c54e7a38d204801d3f6eb710a2dc372 |
|
BLAKE2b-256 | a4a47427dae10532ee4b0b21bd0cfe3ddffb222698520a925ccee920bcafb022 |
Provenance
File details
Details for the file knockoff-4.0.0-py3-none-any.whl
.
File metadata
- Download URL: knockoff-4.0.0-py3-none-any.whl
- Upload date:
- Size: 52.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.6 CPython/3.8.11 Linux/5.8.0-1036-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f92edacc69b677399db1d038cec716671c807e9482cd9ce1c051ad44c09d6af2 |
|
MD5 | 11f16d4d85c397ce65146087ef3fab1a |
|
BLAKE2b-256 | 94b9d5476aec4c2a1c1c0dee71d0fb472c1a3e50ad16491e7855030451f067f0 |