Skip to main content

Framework for declarative database creation and management.

Project description

DBD - framework for declarative database definition

DBD framework allows you to define your database schema and content declaratively. Database is represented by a hierarchy of directories and files stored in DBD model directory.

TLDR: Whetting Your Appetite

  1. dbd init test
  2. cd test
  3. Check out the model directory.
  4. dbd validate .
  5. dbd run .
  6. Connect to the newly created states.db database and review area, population, and state tables that have been created from the files in the model directory.

Now you can delete the example files from the model directory, copy your Excel, JSON, or CSV files there instead. Then execute dbd run . again. Your files should be loaded in the states.db database.

You can create a YAML configuration files for your data (Excel, JSON, or CSV) files to specify individual column's data types, indexes or constraints (e.g. primary key, foreign key, or check). See below for more details.

You can also add an SQL file that performs insert-from-select SQL statement to create database tables with transformed data.

Install DBD

DBD requires Python 3.7.1 or higher.

PyPI

pip3 install dbd

OR

git clone https://github.com/zsvoboda/dbd.git
cd dbd
pip3 install .

Poetry

git clone https://github.com/zsvoboda/dbd.git
cd dbd
poetry install

Generate a new DBD project

You can generate DBD project initial layout by executing init command:

dbd init <new-project-name>

The init command generates a new DBD project directory with the following content:

  • model directory that contains the content files. dbd supports files with .sql, .ddl, .csv, .json, .xlsx and other extensions.
  • dbd.profile configuration file that specifies database connections
  • dbd.project project configuration file

DBD profile configuration file

DBD stores database connections in the dbd.profile configuration file. DBD searches for dbd.profile file in current or in your home directory. You can always specify a custom profile file location using the --profile option of the dbd command.

The profile file is YAML file with the following structure:

databases:
  states:
    db.url: <sql-alchemy-database-url>

Read more about SQL Alchemy database URLs here.

The profile file can contain Jinja2 macros that substitute your environment variables. For example, you can reference database password stored in a SQLITE_PASSWORD environment variable via {{ SQLITE_PASSWORD }} in your DBD profile.

DBD project configuration file

DBD stores project configuration in a project configuration file that is usually stored in your DBD project directory. DBD searches for dbd.project file in your project's directory root. You can also use the --project option of the dbd command to specify a custom project configuration file.

The project configuration file also uses YAML format and references the DBD model directory with the .sql, .csv and other supported files. It also references the database configuration from the profile config file. For example:

model: model
database: states

Similarly like the profile file, you can use the environment variables substitution in the project config file too (e.g. {{ SQLITE_DB_NAME }}).

Model directory

Model directory contains directories and DBD files. Each subdirectory of the model directory represents a database schema. For example, this model directory structure

dbd-project-directory
+- schema1
 +-- us_states.csv
+- schema2
 +-- us_counties.csv

creates two database schemas: schema1 and schema2 and creates two database tables: us_states in schema1 and us_counties in schema2. Both tables are populated with the data from the CSV files.

DBD supports following files located in the model directory:

  • DATA files: .csv, .json, .xls, .xlsx, .parquet files are loaded to the database as tables
  • SQL files: with SQL SELECT statements are executed using insert-from-select SQL construct. The INSERT command is generated (the SQL file only contains the SQL SELECT statement)
  • DDL files: contain a sequence of SQL statements separated by semicolon. The DDL files can be named prolog.ddl and epilog.ddl. The prolog.ddl is executed before all other files in a specific schema. The epilog.ddl is executed last. The prolog.ddl and epilog.ddl in the top-level model directory are executed as the very first and tne very last files in the model.
  • YAML files: specify additional configuration to the DATA and SQL files.

YAML model files

YAML file specify additional configuration for a corresponding DATA or SQL file with the same base file name. Here is an area.csv YAML configuration example:

table:
  columns:
    state_name:
      nullable: false
      index: true
      primary_key: true
      foreign_keys:
        - state.state_name
      type: VARCHAR(50)
    area_sq_mi:
      nullable: false
      index: true
      type: INTEGER
process:
  materialization: table
  mode: drop

Table section

YAML file's columns are mapped to the area.csv data file columns by column names. The following column parameters are supported:

  • type: column's SQL type.
  • primary_key: is the column part of table's primary key (true|false)?
  • foreign_keys: all other database table columns that are referenced from a column in table.column format
  • nullable: does column allow null values (true|false)?
  • index: is column indexed (true|false)?
  • unique: does column store unique values (true|false)?

Process section

The process section specifies the following processing options:

  • materialization: specifies whether DBD creates a physical table or a view when processing a SQL file.
  • mode: specifies how DBD works with a table. You can specify values drop, truncate, or keep. The mode option is ignored for views.

Table section

YAML file's columns are mapped to the area.csv data file columns by column names. The following column parameters are supported:

  • type: column's SQL type.
  • primary_key: is the column part of table's primary key (true|false)?
  • foreign_keys: all other database table columns that are referenced from a column in table.column format
  • nullable: does column allow null values (true|false)?
  • index: is column indexed (true|false)?
  • unique: does column store unique values (true|false)?

License

DBD code is open-sourced under BSD 3-clause license.

Resources and References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbd-0.5.6.tar.gz (101.8 kB view hashes)

Uploaded Source

Built Distribution

dbd-0.5.6-py3-none-any.whl (110.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page