A tool for linking two DataJoint tables located on different database servers
Project description
DataJoint-Link
DataJoint-Link is a tool for linking two DataJoint tables located on different database servers together and exchanging entities between them. It preserves referential integrity across the linked tables and supports useful DataJoint features such as part tables and externally stored files.
How It Works
A link involves three tables. A table from which entities are pulled, called the source table, a table into which the pulled entities are inserted, called the local table, and a third table used to enforce referential integrity between the two. This third table is called the outbound table.
Source Table
The source table is the table from which entities will be pulled and it should be located on a different database server than the local table to justify the usage of this tool. For example, the source table could be located on a database server belonging to a lab that wants to share its data with a collaborator or the public.
Local Table
The local table stores entities that were pulled from the source table. It is created when the link is first established and has primary and non-primary attributes that are identical to the ones of the source table.
Outbound Table
The outbound table is located on the same database server as the source table. Its job is to enforce referential integrity and it does so by depending on the source table and containing all entities that were pulled into the local table. As a result these entities can not be deleted from the source table without also being deleted from the outbound table. This fact in combination with the procedure of deleting entities from the local table before deleting them from the outbound and source tables enforces referential integrity.
Pulling Entities
A pull consists of three consecutive steps. First the entities are fetched from the source table. Next their primary keys are inserted into the outbound table and, lastly, the complete entities are inserted into the local table. If the source table has part tables, their entities will also be fetched and inserted into the corresponding part tables of the local table.
Deleting Entities
A deletion request can be sent from the source to the local side by enabling a flag on the entities that should be deleted in the outbound table. The local side can view these requests and delete the corresponding entities. This will trigger the enablement of another flag in the outbound table letting the source side know the entities have been locally deleted. Now the entities can be safely deleted from the outbound and source tables.
The local side is always free to delete any entities from the local table that had their deletion not requested by the source side. Doing so will simply delete the locally deleted entities from the outbound table.
Install
pip3 install datajoint-link
Older versions of DataJoint-Link that were installed via pip
can be upgraded with
pip3 install --upgrade datajoint-link
Setup
A small one-time setup is necessary before two tables can be linked.
DataJoint-Link requires access to a user on the source database server to create and manage the outbound table and to fetch entities from the local table. This user will be called link-user here and its username and password must be set via the environment variables LINK_USER
and LINK_PASS
, respectively.
Creating a new user with the minimally required set of privileges on the source database to be the link-user is recommended to limit the access of the tool to the database. These privileges include SELECT
and REFERENCE
on the schema containing the source table to be able to create the outbound table and to fetch entities from the source table.
Furthermore the name of the schema containing the outbound table must be specified via the environment variable LINK_OUTBOUND
and the link-user must be granted full privileges on the outbound schema. It is recommended to create a new schema to function as the outbound schema.
Regular database users should have no privileges on the outbound schema to prevent them from accidentally deleting entities that are still present in the local table from the source table, thus breaking referential integrity.
Environment variables example:
LINK_USER=link_user
LINK_PASS=password
LINK_OUTBOUND=outbound_schema
Usage
The definition of the local table looks like this:
from dj_link import LazySchema, Link
local_schema = LazySchema('local_schema')
source_schema = LazySchema('source_schema', host='source_database')
@Link(local_schema, source_schema)
class Table:
"""The local table."""
LazySchema
is a modified version of DataJoint's Schema
class. It establishes the connection to the database when needed and, unlike Schema
, not when the instance is created. The host
keyword argument must be set to the host name of the database server that contains the source schema when creating the corresponding LazySchema
instance.
Two instances of the LazySchema
class must be passed when creating an instance of the Link
class. The first one must correspond to the schema which will contain the local table and the second one to the schema on the source database server containing the source table. Note that the name of the class that gets decorated by the Link
instance must be the same as the name of the source table.
A mapping of store names can be passed as a keyword argument to the constructor of the Link
class:
Link(local_schema, source_schema, stores={'local_store': 'source_store'})
This is necessary if the source table contains references to externally stored files. Note that the local side needs to be able to access the files stored in the source store. Currently it is not possible to pull entities without creating a copy of the external files.
Pulling
The contents of the source table can be viewed using the source
attribute:
Table().source()
All entities that are not present in the local table and whose deletion has not been requested can be pulled using the pull
method:
Table().pull()
DataJoint restrictions can be passed to the pull
method if only specific entities are to be pulled:
Table().pull(*restrictions)
Deleting
The primary keys of entities that should be deleted from the local table can be inserted into the DeletionRequested
part table of the outbound table. The local side can check which entities had their deletion requested by viewing their DeletionRequested
part table:
Table().DeletionRequested()
Deleting the aforementioned entities from the local table can be accomplished by appropriately restricting the local table and calling the delete
method:
(Table() & restriction).delete()
The source side can then see that entities have been deleted from the local table by checking if the corresponding primary keys are present in the DeletionApproved
part table of the outbound table.
Refreshing
The refresh
method can be used to update the DeletionRequested
part table of the local table with new entries from the outbound table:
Table().refresh()
Running Tests
Clone this repository and run the following command from within the cloned repository to run all tests:
docker-compose run pytest tests
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for datajoint_link-0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3896f29b2979d0da69dea97bfa0190046f9b39de94ae012059d84be9e342eb41 |
|
MD5 | 54fd904ea5231ba15c81ed1596a7c3a1 |
|
BLAKE2b-256 | 6b218f65da8d69d812fb10b9fffa79f601b8f1b4a10156e141d84c2908fb9d42 |