Skip to main content

A native python implementation of the rdiff tool by librsync

Project description

RDIFF

This project is an implementation of the rdiff tool of librsync used for finding the diff of a file on a remote machine. (https://github.com/librsync/librsync/blob/master/doc/rdiff.md).

Note:

This tool isn't concerned with how the signature, and delta files are sent/received over the network. It's only concerned with working with these files and performing the patch.

Usage:

Assume two machines, machine A and machine B, have different versions of the same file. Machine A wants to synchronise its file with the file present on machine B.
The following example shows the usage of rdiff on Machine A

import rdiff

# create an instance of the checksum class
checksum = rdiff.signature.Checksum()

# Create an instance of the signature class, passing in the checksum object
# Optionally a blocksize can be specified, defaults to 1024 bytes
signature = rdiff.signature.Signature(checksum=checksum)

# create the signature file
# basisFilePath -> Path to the file for which signature must be generated
# sigFilePath -> Where to store the signature file
signature.createSignature(basisFilePath="path_to_file", sigFilePath="path_to_signature_file")

# This machine sends over the signature file to the remote machine
# The remote machine responds back with a delta file
# rdiff isn't concerned with how this communication over the network takes place

patcher = rdiff.patch.Patch()
# Perform a patch operation
# How the delta file is obtained from the remote machine is not a concern of rdiff
# deltaFilePath -> Path to the delta file obtained from the remote machine
# basisFilePath -> Path to the file which is to be updated
# outFilePath -> Path to store the updated file
# Note: The original file isn't modified, instead a new file is created.
patcher.patchFile(
    deltaFilePath="path_to_delta_file", basisFilePath="path_to_file", outFilePath="path_to_updated_file"
)

The following shows the usage of rdiff on machine B

import rdiff

# create an instance of the checksum class
checksum = rdiff.signature.Checksum()

# create an instance of the delta class
delta = rdiff.delta.Delta()

# The createDelatFile method is used to create a delta file, for a given file
# against a signature file obtained from a remote machine.
# inFilePath -> Path to the updated file, the file which the remote machine wants to synchronise
# deltatFilePath -> Path to store the delta file
# sigFielPath -> Path to the signature file obtained from the remote machine.
# rdiff is not concerned with how this signature file is obtained.
# blocksize -> This should be identical to the blocksize used by the remote machine to generate the
# signature file.
delta.createDeltaFile(
    inFilePath="path_to_updated_file", deltaFilePath="path_to_delta_file",
    sigFielPath="path_to_sig_file", blockSize=1024, checksum=checksum
)

# This machine now sends the delta file to the remote machine.

The following is a combined example, on a single machine. The code can be extended to two remote machines.

"""
Assume there are two machines, machine A and machine B.
Machine A and machine B both have different versions of test.txt
Machine A wants to sync its file to have the same content as the file on machine B.

Machine A creates a signature file and sends it over to machine B.
Machine B uses this signature file and generates a delta file against the signature file
and sends the delta file back to machine A.

Machine A now uses this delta file to patch its file. Thereby, synchronising its file to have
the same content as the file on machine B.

The following is an example on a single machine. The same example can be extended to two
different machines connected over a network.
"""
import rdiff

checksum = rdiff.signature.Checksum()
signature = rdiff.signature.Signature(checksum=checksum, blockSize=1024)

# Machine A making the signature file
signature.createSignature(basisFilePath="path_to_file", sigFilePath="path_to_sig_file")

delta = rdiff.delta.Delta()
# Machine B creates the delta file using the signature file generated by Machine A
delta.createDeltaFile(
    inFilePath="path_to_updated_file", deltaFilePath="path_to_delta_file",
    sigFielPath="path_to_sig_file", blockSize=1024, checksum=checksum
)

patcher = rdiff.patch.Patch()
# Machine A patches its file (creates a new version of the file located at path_to_updated_file)
# using the delta file generated by Machine B
patcher.patchFile(
    deltaFilePath="path_to_delta_file", basisFilePath="path_to_file",
    outFilePath="path_to_updated_file"
)

File Formats

Information on the format/structure of the different files involved can be found at: https://github.com/MohitPanchariya/rdiff/blob/master/file_formats.md

Further Reading

More about the rsync algorithm can be found at: https://rsync.samba.org/tech_report/tech_report.html
A detailed explanation of the rsync algorithm is present in the PhD thesis of Andrew Tridgell, specifically chapter 3 of the thesis: https://www.samba.org/~tridge/phd_thesis.pdf

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rdiff-0.2.3.tar.gz (10.4 kB view hashes)

Uploaded Source

Built Distribution

rdiff-0.2.3-py3-none-any.whl (10.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page