Skip to main content

A native python implementation of the rdiff tool by librsync

Project description

RDIFF

This project is an implementation of the rdiff tool of librsync used for finding the diff of a file on a remote machine. (https://github.com/librsync/librsync/blob/master/doc/rdiff.md).

Note:

This tool isn't concerned with how the signature, and delta files are sent/received over the network. It's only concerned with working with these files and performing the patch.

Usage:

Assume two machines, machine A and machine B, have different versions of the same file. Machine A wants to synchronise its file with the file present on machine B.
The following example shows the usage of rdiff on Machine A

import rdiff

# create an instance of the checksum class
checksum = rdiff.signature.Checksum()

# Create an instance of the signature class, passing in the checksum object
# Optionally a blocksize can be specified, defaults to 1024 bytes
signature = rdiff.signature.Signature(checksum=checksum)

# create the signature file
# basisFilePath -> Path to the file for which signature must be generated
# sigFilePath -> Where to store the signature file
signature.createSignature(basisFilePath="path_to_file", sigFilePath="path_to_signature_file")

# This machine sends over the signature file to the remote machine
# The remote machine responds back with a delta file
# rdiff isn't concerned with how this communication over the network takes place

patcher = rdiff.patch.Patch()
# Perform a patch operation
# How the delta file is obtained from the remote machine is not a concern of rdiff
# deltaFilePath -> Path to the delta file obtained from the remote machine
# basisFilePath -> Path to the file which is to be updated
# outFilePath -> Path to store the updated file
# Note: The original file isn't modified, instead a new file is created.
patcher.patchFile(
    deltaFilePath="path_to_delta_file", basisFilePath="path_to_file", outFilePath="path_to_updated_file"
)

The following shows the usage of rdiff on machine B

import rdiff

# create an instance of the checksum class
checksum = rdiff.signature.Checksum()

# create an instance of the delta class
delta = rdiff.delta.Delta()

# The createDelatFile method is used to create a delta file, for a given file
# against a signature file obtained from a remote machine.
# inFilePath -> Path to the updated file, the file which the remote machine wants to synchronise
# deltatFilePath -> Path to store the delta file
# sigFielPath -> Path to the signature file obtained from the remote machine.
# rdiff is not concerned with how this signature file is obtained.
# blocksize -> This should be identical to the blocksize used by the remote machine to generate the
# signature file.
delta.createDeltaFile(
    inFilePath="path_to_updated_file", deltaFilePath="path_to_delta_file",
    sigFielPath="path_to_sig_file", blockSize=1024, checksum=checksum
)

# This machine now sends the delta file to the remote machine.

The following is a combined example, on a single machine. The code can be extended to two remote machines.

"""
Assume there are two machines, machine A and machine B.
Machine A and machine B both have different versions of test.txt
Machine A wants to sync its file to have the same content as the file on machine B.

Machine A creates a signature file and sends it over to machine B.
Machine B uses this signature file and generates a delta file against the signature file
and sends the delta file back to machine A.

Machine A now uses this delta file to patch its file. Thereby, synchronising its file to have
the same content as the file on machine B.

The following is an example on a single machine. The same example can be extended to two
different machines connected over a network.
"""
import rdiff

checksum = rdiff.signature.Checksum()
signature = rdiff.signature.Signature(checksum=checksum, blockSize=1024)

# Machine A making the signature file
signature.createSignature(basisFilePath="path_to_file", sigFilePath="path_to_sig_file")

delta = rdiff.delta.Delta()
# Machine B creates the delta file using the signature file generated by Machine A
delta.createDeltaFile(
    inFilePath="path_to_updated_file", deltaFilePath="path_to_delta_file",
    sigFielPath="path_to_sig_file", blockSize=1024, checksum=checksum
)

patcher = rdiff.patch.Patch()
# Machine A patches its file (creates a new version of the file located at path_to_updated_file)
# using the delta file generated by Machine B
patcher.patchFile(
    deltaFilePath="path_to_delta_file", basisFilePath="path_to_file",
    outFilePath="path_to_updated_file"
)

File Formats

Information on the format/structure of the different files involved can be found at: https://github.com/MohitPanchariya/rdiff/blob/master/file_formats.md

Further Reading

More about the rsync algorithm can be found at: https://rsync.samba.org/tech_report/tech_report.html
A detailed explanation of the rsync algorithm is present in the PhD thesis of Andrew Tridgell, specifically chapter 3 of the thesis: https://www.samba.org/~tridge/phd_thesis.pdf

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rdiff-0.2.3.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

rdiff-0.2.3-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file rdiff-0.2.3.tar.gz.

File metadata

  • Download URL: rdiff-0.2.3.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.1

File hashes

Hashes for rdiff-0.2.3.tar.gz
Algorithm Hash digest
SHA256 60a1f3c7ee1d745f86f924cfc1f162f84c03e994cb6269b6ce4178b62d9654f9
MD5 39959c6f2f0aa84978c6e723edf24c56
BLAKE2b-256 84687ed3d2cbdc1572c555e8e8f0f99dafb86b377d0750f388d9497787e02e4b

See more details on using hashes here.

File details

Details for the file rdiff-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: rdiff-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.1

File hashes

Hashes for rdiff-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b849e2c717f4ac5a2fbb59e6475179fd910d0ccd9fc658a3da2367312f35921e
MD5 6ed9b05f9bcd01cd50cdae82c4c93ad4
BLAKE2b-256 e13f505072687b170e4483a41af5e8b2d5664dd978f4e9f117d5fa97a6e5a545

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page