Skip to main content

Library for generating and bootstrapping mock data

Project description

Knockoff Factory

codecov Test PyPi Release Docker Build License Python Versions Docker Image Size Python Wheel

A library for generating mock data and creating database fixtures that can be used for unit testing.

Run poetry install/update with psycopg2

Requirements:

  • postgresql (brew install postrgresql)

Run the following command:

pg_config | grep "LDFLAGS ="

Output:

LDFLAGS = -L/usr/local/opt/openssl@1.1/lib -L/usr/local/opt/readline/lib -Wl,-dead_strip_dylibs

Take the value of LDFLAGS and set that environment variable. E.g.:

export LDFLAGS="-L/usr/local/opt/openssl@1.1/lib -L/usr/local/opt/readline/lib -Wl,-dead_strip_dylibs"

You should now be able to run poetry install and/or update commands without failing on psycopg2.

Local Postgres Setup

The following steps can be used to setup a local postgres instance for testing.

Requirements

  • docker
  • poetry (curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python)
  • postgresql (brew install postgresql) or pgcli (brew install pgcli)

Run Postgres

  1. Pull docker image docker pull postgres:11.7
  2. Run docker container: docker run --rm --name pg-docker -e POSTGRES_PASSWORD=docker -d -p 5432:5432 postgres:11.7
    • Note: you can see the running container with docker ps and can terminate it with docker kill pg-docker

You can now access the shell with PGPASSWORD=docker pgcli -h localhost -U postgres or PGPASSWORD=docker psql -h localhost -U postgres.

Tests

Run unit tests

poetry run pytest

The unit tests depend on fixtures using ephemeral postgres databases and/or instances. By default it will attempt to connect to an existing instance at postgresql://postgres@localhost:5432/postgres and will create and destroy databases per fixture. This postgres location can be overridden with the KNOCKOFF_TEST_DB_URI environment variable.

If no external postgres instance is available for testing, but postgresql is installed, the TEST_USE_EXTERNAL_DB environment variable can be set to 0. The fixtures will then rely on the testing.postgresql library to create ephemeral postgres instances per fixture.

If postgres is not available, dependent tests can be disabled with the following environment variable

export TEST_POSTGRES_ENABLED=0

Knockoff Python Based Configuration

We will be providing documentation along with example repo's for the python based configuration using the knockoff.sdk modules.

Knockoff Legacy Yaml Based Configuration

Note: This yaml based configuration has been moved under the legacy subcommand for knockoff and a new yaml based configuration will be introduced that relies on the same objects in python based configuartion (knockoff.sdk).

Creating Databases

Knockoff will start by creating any specified databases. This section is optional if you do not need a database created. You can also configure an engine builder to use by providing the config parameter for each configured database (factory function used is knockoff.utilities.orm.sql.EngineBuilder.from_config) otherwise the default engine based on knockoff environment variables will be used. The following yaml will result in the following sql queries:

  • create database mydb;
  • create user myuser with encrypted password 'MYUSER_PASSWORD';
    • MYUSER_PASSWORD is replaced with the value of the corresponding environment variable
  • grant all privileges on database mydb to myuser;
create-databases:
  - name: mydb
    type: postgres
    users:
      - user: myuser
        password_env: MYUSER_PASSWORD

Load existing table definitions

Knockoff uses Pyrseas's yamltodb tool to load existing table definitions into a database.

Example:

In this example there is a table films with the following definition:

+----------+-------------------+-------------+
| Column   | Type              | Modifiers   |
|----------+-------------------+-------------|
| title    | character varying |  not null   |
| director | character varying |             |
| year     | character varying |             |
+----------+-------------------+-------------+
Indexes:
    "films_pkey" PRIMARY KEY, btree (title)

Executing dbtoyaml mydb results in the following yaml that knockoff can be configured to use to load with yamltodb.

schema public:
  description: standard public schema
  owner: postgres
  privileges:
  - PUBLIC:
    - all
  - postgres:
    - all
  table films:
    columns:
    - title:
        not_null: true
        type: character varying
    - director:
        type: character varying
    - year:
        type: character varying
    owner: myuser
    primary_key:
      films_pkey:
        columns:
        - title

Note: If you are running the local postgres setup described above and running from within a docker container on your mac, you can use the following: PGPASSWORD=docker dbtoyaml -H docker.for.mac.host.internal -U postgres mydb

Loading data into tables

Data can be loaded into new or existing tables.

Examples

The following example loads data into an existing table from a provided csv.

knockoff:
  dag:
    - name: films # arbitrary name of node in dag
      type: table # table | prototype | component | part
      table: films # defaults to the name of the node if not provided 
      source:
        strategy: io
        reader: pandas.read_csv
        kwargs:
          filepath_or_buffer: example/films.csv # local or s3:// path
          sep: "|"
      sink:
        database: mydb
        kwargs:
          if_exists: append # defaults to fail
          index: false # Data is loaded into a pandas DataFrame this option ignores the index

The following example loads data into a new table from data defined in the yaml.

knockoff:
  dag:
    - name: films2 # Note: "table" key not specified, so defaults to "film2"
      type: table
      source:
        strategy: io
        reader: inline
        kwargs:
          sep: ","
          data: |
            title,director,year
            t5,d1,2020
            t6,d2,2020
            t7,d1,2020
      sink:
        database: mydb
        user: myuser
        password_env: MYUSER_PASSWORD
        kwargs:
          index: false

Generating fake retail data

Knockoff uses faker to help generate fake retail data that can be used for testing. Hierarchical relationships with various dependencies can be also be modelled with knockoff. This example generates the following tables (in addition to the above examples).

postgres@localhost:mydb> select * from location;
+-------------------------------+---------------+-----------+
| address                       | location_id   | channel   |
|-------------------------------+---------------+-----------|
| 07528 Fischer Track Suite 779 | 1             | nfs       |
| Melissaview, MD 90363         |               |           |
| 1535 Kelly Canyon             | 2             | nso       |
| Rhodesborough, CA 43893       |               |           |
| 216 Kayla Lake Apt. 126       | 3             | nso       |
| South Matthewmouth, OH 36332  |               |           |
| 561 Jones Burg Suite 382      | 4             | nso       |
| Hugheschester, DE 21908       |               |           |
| 042 Robinson Fort Suite 945   | 5             | nfs       |
| Pattersonshire, NC 96317      |               |           |
| 2332 Watkins Road             | 0             | digital   |
| Davidfort, MS 71411           |               |           |
+-------------------------------+---------------+-----------+

postgres@localhost:mydb> select * from product;
+------------+----------+-----------------------+---------------+------------+
| division   | gender   | category              | color         | sku        |
|------------+----------+-----------------------+---------------+------------|
| apparel    | men      | shorts                | PaleGoldenRod | 6357812379 |
| apparel    | women    | pants & tights        | NavajoWhite   | 8332320303 |
| apparel    | men      | tops & t-shirts       | Lavender      | 9243289077 |
| shoes      | men      | lifestyle             | PaleGoldenRod | 7270972977 |
| apparel    | women    | pants & tights        | PaleGoldenRod | 4443641793 |
| apparel    | women    | hoodies & sweatshirts | NavajoWhite   | 6130018459 |
| shoes      | women    | jordan                | Lavender      | 3791231041 |
| apparel    | men      | pants & tights        | PaleGoldenRod | 3899370297 |
| apparel    | men      | shorts                | Lavender      | 7557742055 |
| apparel    | men      | pants & tights        | SkyBlue       | 9785957221 |
| apparel    | women    | shorts                | SkyBlue       | 9979359561 |
| apparel    | women    | tops & t-shirts       | Lavender      | 7006056836 |
| shoes      | women    | jordan                | Lavender      | 4853474331 |
| shoes      | women    | jordan                | NavajoWhite   | 6589395336 |
| apparel    | men      | pants & tights        | Beige         | 7168664719 |
| apparel    | men      | hoodies & sweatshirts | Beige         | 7525844204 |
| apparel    | men      | shorts                | SkyBlue       | 9735336861 |
| shoes      | men      | skateboarding         | SkyBlue       | 6385212885 |
| apparel    | men      | tops & t-shirts       | Beige         | 9735107927 |
| apparel    | women    | pants & tights        | SkyBlue       | 2633853831 |
| apparel    | women    | jackets & vests       | NavajoWhite   | 2758275877 |
| apparel    | men      | shorts                | Lavender      | 1330756304 |
| apparel    | women    | tops & t-shirts       | NavajoWhite   | 9334676293 |
| shoes      | men      | skateboarding         | Lavender      | 6735393792 |
| apparel    | men      | jackets & vests       | Lavender      | 2907811814 |
+------------+----------+-----------------------+---------------+------------+

postgres@localhost:mydb> select * from transactions limit 10;
+---------------+------------+-----------+------------+------------+------------+
| location_id   | sku        | line_id   | order_id   | quantity   | date       |
|---------------+------------+-----------+------------+------------+------------|
| 5             | 7557742055 | 2         | 2957859949 | 0          | 2018-05-06 |
| 1             | 1330756304 | 1         | 3920316859 | 0          | 2018-07-19 |
| 4             | 9243289077 | 3         | 1875617688 | 0          | 2019-10-14 |
| 3             | 9334676293 | 2         | 7317451987 | 0          | 2018-06-09 |
| 0             | 9979359561 | 3         | 1236640244 | 2          | 2019-07-17 |
| 0             | 9735107927 | 1         | 9030486883 | 3          | 2018-04-27 |
| 1             | 9735336861 | 2         | 6196902209 | 2          | 2020-01-21 |
| 3             | 7006056836 | 4         | 1432537227 | 0          | 2019-04-22 |
| 2             | 2907811814 | 1         | 4039132536 | 0          | 2019-09-23 |
| 0             | 4443641793 | 5         | 8648324533 | 2          | 2018-09-06 |
+---------------+------------+-----------+------------+------------+------------+

Run the example:

  1. Run postgres: docker run --rm --name pg-docker -e POSTGRES_PASSWORD=docker -d -p 5432:5432 postgres:11.9
    • Terminate existing container with docker kill pg-docker
    • Note: The example assumes you're running with POSTGRES_PASSWORD=docker and on port 5432
  2. Checkout the repo or download the examples folder
  3. Pull knockoff docker image: docker pull knockoff-factory
docker run --rm -v $PWD/examples:/examples \
-e KNOCKOFF_DB_HOST='docker.for.mac.host.internal' \
-e KNOCKOFF_DB_USER='postgres' \
-e KNOCKOFF_DB_PASSWORD='docker' \
-e KNOCKOFF_DB_NAME='knockoff' \
-e KNOCKOFF_CONFIG=/examples/knockoff.yaml knockoff-factory:latest knockoff legacy

Note: if you are loading data from an s3 bucket you have access to, you can enable your docker container access to those credentials by adding -v ~/.aws:/root/.aws to the docker run command.

Future work

  • Add documentation and examples for SDK
  • Add yaml based configuration for SDK
  • Make extensible generic output for KnockffDB.insert (csv, parquet, etc)
  • Enable append option for KnockoffDB.insert
  • Autodiscover and populate all tables by using reflection and building dependency graph with foreign key relationships
  • Documentation / use-cases
  • Parallelize execution of dag. (e.g. https://ipython.org/ipython-doc/stable/parallel/dag_dependencies.html)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

knockoff-4.0.0.tar.gz (38.9 kB view details)

Uploaded Source

Built Distribution

knockoff-4.0.0-py3-none-any.whl (52.5 kB view details)

Uploaded Python 3

File details

Details for the file knockoff-4.0.0.tar.gz.

File metadata

  • Download URL: knockoff-4.0.0.tar.gz
  • Upload date:
  • Size: 38.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.6 CPython/3.8.11 Linux/5.8.0-1036-azure

File hashes

Hashes for knockoff-4.0.0.tar.gz
Algorithm Hash digest
SHA256 b680cd32c4e1f351bf73f15e874f080d0eac9bcc551734631ee469c5f1b74b62
MD5 0c54e7a38d204801d3f6eb710a2dc372
BLAKE2b-256 a4a47427dae10532ee4b0b21bd0cfe3ddffb222698520a925ccee920bcafb022

See more details on using hashes here.

Provenance

File details

Details for the file knockoff-4.0.0-py3-none-any.whl.

File metadata

  • Download URL: knockoff-4.0.0-py3-none-any.whl
  • Upload date:
  • Size: 52.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.6 CPython/3.8.11 Linux/5.8.0-1036-azure

File hashes

Hashes for knockoff-4.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f92edacc69b677399db1d038cec716671c807e9482cd9ce1c051ad44c09d6af2
MD5 11f16d4d85c397ce65146087ef3fab1a
BLAKE2b-256 94b9d5476aec4c2a1c1c0dee71d0fb472c1a3e50ad16491e7855030451f067f0

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page