Skip to main content

Library to easily sync/diff/update 2 different data sources

Project description

DiffSync

DiffSync is a utility library that can be used to compare and synchronize different datasets.

For example, it can be used to compare a list of devices from 2 inventories system and, if required, synchronize them in either direction.

A = DiffSyncSystemA()
B = DiffSyncSystemB()

A.load()
B.load()

# it will show the difference between both systems
diff_a_b = A.diff_from(B)
print(diff.str())

# it will update System A to align with the current status of system B
A.sync_from(B)

# it will update System B to align with the current status of system A
A.sync_to(B)

Getting Started

To be able to properly compare different datasets, DiffSync relies on a shared datamodel that both systems must use.

Define your model with DiffSyncModel

DiffSyncModel is based on Pydantic and is using Python Typing to define the format of each attribute. Each DiffSyncModel class supports the following class-level attributes:

  • _modelname (str) Define the type of the model, it's used to store the data internally (Mandatory)
  • _identifiers List(str) List of instance field names used as primary keys for this object (Mandatory)
  • _shortname List(str) List of instance field names to use for a shorter name (Optional)
  • _attributes List(str) List of additional instance field names for this object (Optional)
  • _children Dict: Dict of {<modelname>: field_name} to indicate how child objects should be stored. (Optional)

DiffSyncModel instances must be uniquely identified by their unique id, composed of all fields defined in _identifiers. DiffSyncModel does not support incremental IDs as primary key.

from diffsync import DiffSyncModel

class Site(DiffSyncModel):
    _modelname = "site"
    _identifiers = ("name",)
    _shortname = ()
    _attributes = ("contact_phone",)
    _children = {"device": "devices"}

    name: str
    contact_phone: str
    devices: List = list()

Relationship between models.

Currently the relationships between models are very loose by design. Instead of storing an object, it's recommended to store the uid of an object and retrieve it from the store as needed.

DiffSync

A DiffSync object must reference each model available at the top of the object by its modelname and must have a top_level attribute defined to indicate how the diff and the synchronization should be done. In the example below, "site" is the only top level objects so the synchronization engine will check all sites and all children of each site (devices)

from diffsync import DiffSync

class BackendA(DiffSync):

    site = Site
    device = Device

    top_level = ["site"]

It's up to the user to populate the internal cache with the appropriate data. In the example below we are using the load() method to populate the cache but it's not mandatory, it could be done differently

Store data in a DiffSync object

To add a site to the local cache/store, you need to pass a valid DiffSyncModel object to the add() function.

class BackendA(DiffSync):
    [...]

    def load(self):
        # Store an individual object
        site = self.site(name="nyc")
        self.add(site)

        # Store an object and define it as a children for another object
        device = self.device(name="rtr-nyc", role="router", site_name="nyc")
        self.add(device)
        site.add_child(device)

Update Remote system on Sync

To update a remote system, you need to extend your DiffSyncModel class(es) to define your own create, update and/or delete methods for each model. A DiffSyncModel instance stores a reference to its parent DiffSync class in case you need to use it to look up other model instances from the DiffSync's cache.

class Device(DiffSyncModel):
    [...]

    @classmethod
    def create(cls, diffsync, ids, attrs):
        ## TODO add your own logic here to create the device on the remote system
        return super().create(ids=ids, diffsync=diffsync, attrs=attrs)

    def update(self, attrs):
        ## TODO add your own logic here to update the device on the remote system
        return super().update(attrs)

    def delete(self):
        ## TODO add your own logic here to delete the device on the remote system
        super().delete()
        return self

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diffsync-1.0.0.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

diffsync-1.0.0-py3-none-any.whl (22.3 kB view details)

Uploaded Python 3

File details

Details for the file diffsync-1.0.0.tar.gz.

File metadata

  • Download URL: diffsync-1.0.0.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.5 CPython/3.7.7 Darwin/19.6.0

File hashes

Hashes for diffsync-1.0.0.tar.gz
Algorithm Hash digest
SHA256 9f7654f3bd03d3744fbc67b77ccaf53ae1059bf09c93ed6096a2bd75169a0b76
MD5 e1289a7f5f6337443b962957333fcf47
BLAKE2b-256 e230517dae40e7744d87e5d0febbc45a9eb8f86a4a6ff2737dcc53d5597c2c36

See more details on using hashes here.

File details

Details for the file diffsync-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: diffsync-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 22.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.5 CPython/3.7.7 Darwin/19.6.0

File hashes

Hashes for diffsync-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1075f1c5a33f49440f1a6165337e9275cfc40776979629aeb19f2debb6c115fe
MD5 07662861b9dfffa51df64370c10821bb
BLAKE2b-256 bfbeef7f449337f18c79014c03755af0516213b682d57587ea682eff2ca7417c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page