A tool for linking two DataJoint tables located on different database servers
Project description
:link: datajoint-link
A tool for convenient and integrity-preserving data sharing between database servers.
:floppy_disk: Installation
Only users interacting with the destination of the data need to install the datajoint-link package:
pip install datajoint-link
:wrench: Setup
Source
Datajoint-link requires access to the database server from which data will be pulled. It is recommended to create a new user for this purpose:
CREATE USER 'djlink'@'%' IDENTIFIED BY 'secret-password';
The user needs to have certain privileges on the table from which data will be pulled:
GRANT SELECT, REFERENCES ON `source\_schema`.`source\_table` TO 'djlink'@'%';
Each table from which data will be pulled also needs an additional helper table:
GRANT ALL PRIVILEGES ON `helper\_schema`.`helper\_table` TO 'djlink'@'%';
In order to preserve data integrity across the link regular users must not have any privileges on this helper table.
Destination
Datajoint-link needs to be configured with the username and password of the user created in the previous section. This is accomplished via environment variables:
LINK_USER=djlink
LINK_PASS=secret-password
:computer: Usage
The destination table is created by passing information about where to find the source table to the link
decorator:
from dj_link import link
@link(
"databaseserver.com",
"source_schema",
"helper_schema",
"helper_table",
"destination_schema"
)
class Table:
"""Some table present in the source schema on the source database server."""
Note that the name of the declared class must match the name of the table from which the data will be pulled.
The class returned by the decorator behaves like a regular table with some added functionality. For one it allows the browsing of data present in the source table:
Table().source
All the data can be pulled like so:
Table().source.pull()
That said usually we only want to pull some rows that match a certain criteria:
(Table().source & "foo = 1").pull()
The deletion of already pulled rows works the same as for any other table:
(Table() & "foo = 1").delete()
The deletion of certain rows from the destination can also be requested by flagging them in the corresponding helper table:
row = (Helper() & "foo = 1").fetch1()
(Helper() & row).delete()
row["is_flagged"] = "TRUE"
Helper().insert1(row)
The flagged
attribute makes the deletion of flagged rows from the destination table convenient:
(Table() & Table().source.flagged).delete()
Deleting a flagged row automatically updates its corresponding row in the helper table:
assert (Helper() & "foo = 1").fetch1("is_deprecated") == "TRUE" # No error!
Now it is save to delete the row from the source table as well!
:package: External Storage
Data stored in a source table that refers to one (or more) external stores can be stored in different store(s) after pulling:
@link(
...,
stores={"source_store": "destination_store"}
)
class Table:
...
Note that all stores mentioned in the dictionary need to be configured via dj.config
.
:white_check_mark: Tests
Clone this repository and run the following command from within the cloned repository to run all tests:
docker compose run functional_tests tests
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for datajoint_link-0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d3c6b2adb741c88ba926f501cb11b30dc67743e03a34030634ffd18c9559c08 |
|
MD5 | 449c546deb21568a0c2a9c9b767bda4d |
|
BLAKE2b-256 | db38bf98a7d0c76661578b92161693184be4f57594af5b691184a53d0d13349e |