Skip to main content

YAML based data loader

Project description

author:

Lele Gaifax

contact:

lele@metapensiero.it

license:

GNU General Public License version 3 or later

Load new instances in the database, or update/delete existing ones, given a data structure represented by a YAML stream, as the following:

- entity: gam.model.Fascicolo
  key: descrizione
  # no data, just "declare" the entity

- entity: gam.model.TipologiaFornitore
  key: tipologiafornitore
  data:
    - &tf_onesto
      tipologiafornitore: Test fornitori onesti

- entity: gam.model.ClienteFornitore
  key: descrizione
  data:
    - descrizione: Test altro fornitore onesto
      tipologiafornitore: *tf_onesto
      partitaiva: 01234567890
    - &cf_lele
      codicefiscale: GFSMNL68C18H612V
      descrizione: Dipendente A

- entity: gam.model.Dipendente
  key: codicefiscale
  data:
    - &lele
      codicefiscale: GFSMNL68C18H612V
      nome: Emanuele
      cognome: Gaifas
      clientefornitore: *cf_lele
      foto: !File {path: ../img/lele.jpg}

- entity: gam.model.Attrezzature
  key: descrizione
  data:
    - &macchina
      descrizione: Fiat 500

- entity: gam.model.Prestiti
  key:
    - dipendente
    - attrezzatura
  data:
    - dipendente: *lele
    - attrezzatura: *macchina

The key entry may be either a single attribute name or a list of them, not necessarily corresponding to the primary key of the entity. To handle the simplest case of structured values (for example, when a field is backed by a PostgreSQL HSTORE), the key attribute name may be in the form name->slot:

- entity: model.Product
  key: description->en
  data:
    - &cage
      description:
        en: "Roadrunner cage"
        it: "Gabbia per struzzi"

When all (or most of) the instances share the same fields, a more compact representation may be used:

- entity: model.Values
  key:
    - product
    - attribute
  fields: [ product, attribute, value ]
  data:
    - [ *cage, *size, 110cm x 110cm x 120cm ]
    - [ *cage, *weight, 230kg ]

where fields contains a list of field names and data is a sequence of lists, each containing the values of a single instance. The two sintaxes may be mixed though, so you can say:

- entity: model.Person
  key: [ lastname, firstname ]
  fields: [ lastname, firstname, password ]
  data:
    - [ gaifax, lele, "123456" ]
    - [ foobar, john, "abcdef" ]
    - lastname: rossi
      firstname: paolo
      birthdate: 1950-02-03

Test fixture facility

With the option --save-new-instances newly created instances will be written (actually added) to the given file in YAML format, so that at some point they can be deleted using the option --delete on that file. Ideally

dbloady -u postgresql://localhost/test -s new.yaml fixture.yaml
dbloady -u postgresql://localhost/test -D new.yaml

should remove fixture’s traces from the database, if it contains only new data.

Pre and post load scripts

The option --preload may be used to execute an arbitrary Python script before any load happens. This is useful either to tweak the YAML context or to alter the set of file names specified on the command line (received as the fnames global variable).

The following script registers a custom costructor that recognizes the tag !time or a value like T12:34 as a datetime.time value:

import datetime, re
import yaml

def time_constructor(loader, node):
    value = loader.construct_scalar(node)
    if value.startswith('T'):
        value = value[1:]
    parts = map(int, value.split(':'))
    return datetime.time(*parts)

yaml.add_constructor('!time', time_constructor)
yaml.add_implicit_resolver('!time', re.compile(r'^T?\d{2}:\d{2}(:\d{2})?$'), ['T'])

As another example, the following script handles input files with a .gpg suffix decrypting them on the fly to a temporary file that will be deleted when the program exits:

import atexit, os, subprocess, tempfile

def decipher(fname):
    print("Input file %s is encrypted, please enter passphrase" % fname)
    with tempfile.NamedTemporaryFile(suffix='.yaml') as f:
        tmpfname = f.name
    subprocess.run(['gpg', '-q', '-o', tmpfname, fname], check=True)
    atexit.register(lambda n=tmpfname: os.unlink(n))
    return tmpfname

fnames = [decipher(fname) if fname.endswith('.gpg') else fname for fname in fnames]

Then you have:

dbloady -u postgresql://localhost/test -p preload.py data.yaml.gpg
Input file data.yaml.gpg is encrypted, please enter passphrase
/tmp/tmpfhjrdqgf.yaml: ......
Committing changes

The option --postload may be used to perform additional steps after all YAML files have been loaded but before the DB transaction is committed.

The pre/post load scripts are executed with a context containing the following variables:

session

the SQLAlchemy session

dry_run

the value of the --dry-run option

fnames

the list of file names specified on the command line

Changes

1.4 (2016-02-10)

  • Data files and preload/postload scripts may be specified also as package relative resources

1.3 (2016-01-14)

  • New –preload and –postload options to execute arbitrary Python scripts before or after the load

1.2 (2016-01-09)

  • Fix source distribution

1.1 (2016-01-09)

  • Fix data refs when loading from compact representation

1.0 (2016-01-07)

  • Allow more compact representation when all instances share the same fields

  • Extract dbloady from metapensiero.sphinx.patchdb 1.4.2 into a standalone package

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metapensiero.sqlalchemy.dbloady-1.4.tar.gz (15.2 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page