YAML based data loader
Project description
- author:
Lele Gaifax
- contact:
- license:
GNU General Public License version 3 or later
Load new instances in the database, or update/delete existing ones, given a data structure represented by a YAML stream, as the following:
- entity: gam.model.Fascicolo key: descrizione # no data, just "declare" the entity - entity: gam.model.TipologiaFornitore key: tipologiafornitore data: - &tf_onesto tipologiafornitore: Test fornitori onesti - entity: gam.model.ClienteFornitore key: descrizione data: - descrizione: Test altro fornitore onesto tipologiafornitore: *tf_onesto partitaiva: 01234567890 - &cf_lele codicefiscale: GFSMNL68C18H612V descrizione: Dipendente A - entity: gam.model.Dipendente key: codicefiscale data: - &lele codicefiscale: GFSMNL68C18H612V nome: Emanuele cognome: Gaifas clientefornitore: *cf_lele foto: !File {path: ../img/lele.jpg} - entity: gam.model.Attrezzature key: descrizione data: - &macchina descrizione: Fiat 500 - entity: gam.model.Prestiti key: - dipendente - attrezzatura data: - dipendente: *lele - attrezzatura: *macchina
The key entry may be either a single attribute name or a list of them, not necessarily corresponding to the primary key of the entity. To handle the simplest case of structured values (for example, when a field is backed by a PostgreSQL HSTORE), the key attribute name may be in the form name->slot:
- entity: model.Product key: description->en data: - &cage description: en: "Roadrunner cage" it: "Gabbia per struzzi"
When all (or most of) the instances share the same fields, a more compact representation may be used:
- entity: model.Values key: - product - attribute fields: [ product, attribute, value ] data: - [ *cage, *size, 110cm x 110cm x 120cm ] - [ *cage, *weight, 230kg ]
where fields contains a list of field names and data is a sequence of lists, each containing the values of a single instance. The two sintaxes may be mixed though, so you can say:
- entity: model.Person key: [ lastname, firstname ] fields: [ lastname, firstname, password ] data: - [ gaifax, lele, "123456" ] - [ foobar, john, "abcdef" ] - lastname: rossi firstname: paolo birthdate: 1950-02-03
Test fixture facility
With the option --save-new-instances newly created instances will be written (actually added) to the given file in YAML format, so that at some point they can be deleted using the option --delete on that file. Ideally
dbloady -u postgresql://localhost/test -s new.yaml fixture.yaml dbloady -u postgresql://localhost/test -D new.yaml
should remove fixture’s traces from the database, if it contains only new data.
Pre and post load scripts
The option --preload may be used to execute an arbitrary Python script before any load happens. This is useful either to tweak the YAML context or to alter the set of file names specified on the command line (received as the fnames global variable).
The following script registers a custom costructor that recognizes the tag !time or a value like T12:34 as a datetime.time value:
import datetime, re import yaml def time_constructor(loader, node): value = loader.construct_scalar(node) if value.startswith('T'): value = value[1:] parts = map(int, value.split(':')) return datetime.time(*parts) yaml.add_constructor('!time', time_constructor) yaml.add_implicit_resolver('!time', re.compile(r'^T?\d{2}:\d{2}(:\d{2})?$'), ['T'])
As another example, the following script handles input files with a .gpg suffix decrypting them on the fly to a temporary file that will be deleted when the program exits:
import atexit, os, subprocess, tempfile def decipher(fname): print("Input file %s is encrypted, please enter passphrase" % fname) with tempfile.NamedTemporaryFile(suffix='.yaml') as f: tmpfname = f.name subprocess.run(['gpg', '-q', '-o', tmpfname, fname], check=True) atexit.register(lambda n=tmpfname: os.unlink(n)) return tmpfname fnames = [decipher(fname) if fname.endswith('.gpg') else fname for fname in fnames]
Then you have:
dbloady -u postgresql://localhost/test -p preload.py data.yaml.gpg Input file data.yaml.gpg is encrypted, please enter passphrase /tmp/tmpfhjrdqgf.yaml: ...... Committing changes
The option --postload may be used to perform additional steps after all YAML files have been loaded but before the DB transaction is committed.
The pre/post load scripts are executed with a context containing the following variables:
- session
the SQLAlchemy session
- dry_run
the value of the --dry-run option
- fnames
the list of file names specified on the command line
Changes
1.3 (2016-01-14)
New –preload and –postload options to execute arbitrary Python scripts before or after the load
1.2 (2016-01-09)
Fix source distribution
1.1 (2016-01-09)
Fix data refs when loading from compact representation
1.0 (2016-01-07)
Allow more compact representation when all instances share the same fields
Extract dbloady from metapensiero.sphinx.patchdb 1.4.2 into a standalone package
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for metapensiero.sqlalchemy.dbloady-1.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 218facb4bd22a9dcea101ec0de110302b4247f20740c93bb76012b97fa6026cf |
|
MD5 | 20806802fefe7f6334fb00eb5a531081 |
|
BLAKE2b-256 | b71f18290c4c54beb96faa2f935832a98d14594a6e00c64bae833d52cdf4c4f9 |