YAML based data loader
Project description
- author:
Lele Gaifax
- contact:
- license:
GNU General Public License version 3 or later
Data loader
Load new instances in the database, or update/delete existing ones, given a data structure represented by a YAML stream, as the following:
- entity: gam.model.Fascicolo
key: descrizione
# no data, just "declare" the entity
- entity: gam.model.TipologiaFornitore
key: tipologiafornitore
rows:
- &tf_onesto
tipologiafornitore: Test fornitori onesti
- entity: gam.model.ClienteFornitore
key: descrizione
rows:
- descrizione: Test altro fornitore onesto
tipologiafornitore: *tf_onesto
partitaiva: 01234567890
- &cf_lele
codicefiscale: GFSMNL68C18H612V
descrizione: Dipendente A
- entity: gam.model.Dipendente
key: codicefiscale
rows:
- &lele
codicefiscale: GFSMNL68C18H612V
nome: Emanuele
cognome: Gaifas
clientefornitore: *cf_lele
foto: !File {path: ../img/lele.jpg}
- entity: gam.model.Attrezzature
key: descrizione
rows:
- &macchina
descrizione: Fiat 500
foto: !File
compressor: lzma
content: !!binary |
/Td6WFoAAATm1rRGAgAhA...
- entity: gam.model.Prestiti
key:
- dipendente
- attrezzatura
rows:
- dipendente: *lele
attrezzatura: *macchina
As you can see, the YAML document is a sequence of entries, each one defining the content of a set of instances of a particular entity.
The entity must be the fully qualified dotted name of the SQLAlchemy mapped class.
The key entry may be either a single attribute name or a list of them, not necessarily corresponding to the primary key of the entity, provided that it uniquely identifies a single instance. To handle the simplest case of structured values (for example, when a field is backed by a PostgreSQL HSTORE), the key attribute name may be in the form name->slot:
- entity: model.Product
key: description->en
rows:
- &cage
description:
en: "Roadrunner cage"
it: "Gabbia per struzzi"
The rows (or data) may be either a single item or a list of them, each containing the data of a single instance, usually a dictionary.
When all (or most of) the instances share the same fields, a more compact representation may be used:
- entity: model.Values
key:
- product
- attribute
fields: [ product, attribute, value ]
rows:
- [ *cage, *size, 110cm x 110cm x 120cm ]
- [ *cage, *weight, 230kg ]
where fields contains a list of field names and rows is a sequence of lists, each containing the values of a single instance. The two sintaxes may be mixed though, so you can say:
- entity: model.Person
key: [ lastname, firstname ]
fields: [ lastname, firstname, password ]
rows:
- [ gaifax, lele, "123456" ]
- [ foobar, john, "abcdef" ]
- lastname: rossi
firstname: paolo
birthdate: 1950-02-03
If you have a tab-separated-values file, you may say:
- entity: model.Cities
key:
- name
- country
fields: [ name, country ]
rows: !TSV {path: ../data/cities.txt, encoding: utf-8}
and if the field names are included in the the first row of the file, simply omit the fields slot:
- entity: model.Countries
key:
- code
rows: !TSV {path: ../data/countries.txt, encoding: utf-8}
The dbloady tool iterates over all the entities, and for each instance it determines if it already exists querying the database with the given key: if it’s there, it updates it otherwise it creates a new one and initializes it with its data.
Test fixture facility
With the option --save-new-instances newly created instances will be written (actually added) to the given file in YAML format, so that at some point they can be deleted using the option --delete on that file. Ideally
dbloady -u postgresql://localhost/test -s new.yaml fixture.yaml dbloady -u postgresql://localhost/test -D new.yaml
should remove fixture’s traces from the database, if it contains only new data.
Pre and post load scripts
The option --preload may be used to execute an arbitrary Python script before any load happens. This is useful either to tweak the YAML context or to alter the set of file names specified on the command line (received as the fnames global variable).
The following script registers a custom costructor that recognizes the tag !time or a value like T12:34 as a datetime.time value:
import datetime, re
from ruamel import yaml
def time_constructor(loader, node):
value = loader.construct_scalar(node)
if value.startswith('T'):
value = value[1:]
parts = map(int, value.split(':'))
return datetime.time(*parts)
yaml.add_constructor('!time', time_constructor)
yaml.add_implicit_resolver('!time', re.compile(r'^T?\d{2}:\d{2}(:\d{2})?$'), ['T'])
As another example, the following script handles input files with a .gpg suffix decrypting them on the fly to a temporary file that will be deleted when the program exits:
import atexit, os, subprocess, tempfile
def decipher(fname):
print("Input file %s is encrypted, please enter passphrase" % fname)
with tempfile.NamedTemporaryFile(suffix='.yaml') as f:
tmpfname = f.name
subprocess.run(['gpg', '-q', '-o', tmpfname, fname], check=True)
atexit.register(lambda n=tmpfname: os.unlink(n))
return tmpfname
fnames = [decipher(fname) if fname.endswith('.gpg') else fname for fname in fnames]
Then you have:
dbloady -u postgresql://localhost/test -p preload.py data.yaml.gpg Input file data.yaml.gpg is encrypted, please enter passphrase /tmp/tmpfhjrdqgf.yaml: ...... Committing changes
The option --postload may be used to perform additional steps after all YAML files have been loaded but before the DB transaction is committed.
The pre/post load scripts are executed with a context containing the following variables:
- session
the SQLAlchemy session
- dry_run
the value of the --dry-run option
- fnames
the list of file names specified on the command line
Generic foreign keys
Version 1.6 introduced rudimentary and experimental support for the generic foreign keys trick. It assumes that they are implemented with a hybrid property that exposes a custom comparator. See tests/generic_fk/model.py for an example.
With a proper configuration, the following works:
- entity: model.Customer
key: name
data:
- &customer
name: Best customer
- entity: model.Supplier
key: company_name
data:
- &supplier
company_name: ACME
- entity: model.Address
key:
- related_object
- street
data:
- related_object: *customer
street: 123 anywhere street
- related_object: *supplier
street: 321 long winding road
Direct assignment of primary keys
When the attribute does not correspond to a relationship property, assignment of an instance reference will set the attribute to the instance’s primary key:
- entity: model.Person
key:
- lastname
- firstname
fields:
- lastname
- firstname
data:
- &johndoe [ Doe, John ]
- entity: model.CannedFilter
key: description
data:
- &onlyjohndoe
description: "Only John Doe"
- entity: model.Condition
key:
- filter
- fieldname
data:
- filter: *onlyjohndoe
fieldname: "persons.id"
fieldvalue: *johndoe
Raw SQL values
Sometime a value requires executing an arbitrary query on the database, maybe because it is computed by a trigger or more generally because it cannot be determined by the YAML content:
- entity: model.Number
key:
id
data:
- id: 1
absolute: !SQL {query: "SELECT abs(:num)", params: {num: -1}}
- id: !SQL {query: "SELECT abs(:num)", params: {num: -2}}
absolute: !SQL {query: "SELECT abs(:num)", params: {num: -2}}
- id: 3
absolute: !SQL {query: "SELECT count(*) FROM numbers"}
The specified query must return a single value, as it is executed with session.scalar().
Data dumper
With the complementary tool, dbdumpy, you can obtain a YAML representation out of a database in the same format used by dbloady. It’s rather simple and in particular it does not handle reference cycles.
The tool is driven by a specs file, a YAML document composed by two parts: the first defines the pivots instances (that is, the entry points), the second describes how each entity must be serialized and in which order.
Consider the following document:
- entity: model.Result
---
- entity: model.Person
key:
- lastname
- firstname
- entity: model.Exam
fields: description
key: description
- entity: model.Result
key:
- person
- exam
other:
- vote
It tells dbdumpy to consider all instances of model.Result as the pivots, then defines how each entity must be serialized, simply by listing the key attribute(s) and any further other field. Alternatively, you can specify a list of fields names, to obtain the more compact form described above.
Executing
dbdumpy -u sqlite:////foo/bar.sqlite spec.yaml
will emit the following on stdout:
- entity: model.Person
key:
- lastname
- firstname
rows:
- &id002
firstname: John
lastname: Doe
- &id003
firstname: Bar
lastname: Foo
- entity: model.Exam
fields: description
key: description
rows:
- &id001
- Drive license
- entity: model.Result
key:
- person
- exam
rows:
- exam: *id001
person: *id002
vote: 10
- exam: *id001
person: *id003
vote: 5
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file metapensiero_sqlalchemy_dbloady-3.1.tar.gz.
File metadata
- Download URL: metapensiero_sqlalchemy_dbloady-3.1.tar.gz
- Upload date:
- Size: 33.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d8c702e46b75122c229a349c784db4b69812e44487c8650f2d8a48ff0fa60f7
|
|
| MD5 |
e1130aeca85e07a896611803595b8f2d
|
|
| BLAKE2b-256 |
59af98dfed25854bcc62759c5b2616096d7f25097ddb51f40dde607723394593
|
File details
Details for the file metapensiero_sqlalchemy_dbloady-3.1-py3-none-any.whl.
File metadata
- Download URL: metapensiero_sqlalchemy_dbloady-3.1-py3-none-any.whl
- Upload date:
- Size: 18.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11a2060880be0c1540427c2bf398910e7a1c1244c1956fa9037a8e3f679095e9
|
|
| MD5 |
e5ac54e8f6ccc9f783d2e84c34fe092e
|
|
| BLAKE2b-256 |
8bf8c3e306152f1b3ee8ec4c6032840647ea675e71ff5b4dfa1f10137f59c0c1
|