Skip to main content

A function to get the difference between two files and return a pandas DataFrame

Project description

A function to get the difference between two files and return a pandas DataFrame

pip install textcompari

Tested against Windows 10 / Python 3.11 / Anaconda

Important!

The module will be compiled when you import it for the first time. Cython and a C/C++ compiler must be installed!

from rapidfuzz import fuzz
from textcompari import get_file_diff

"""
A function to get the difference between two files and return a pandas DataFrame.

:param afile: A file to compare (str, bytes, tuple, list, np.ndarray)
:param bfile: Another file to compare (str, bytes, tuple, list, np.ndarray)
:param window_shifts: The number of shifts for the window (default 5)
:param min_fuzz_match: The minimum fuzzy match score (default 80)
:param fuzz_scorer: The fuzzy scorer function (default fuzz.WRatio)
:param cpus: The number of CPUs to use (default 5)
:return: A pandas DataFrame containing the difference between the files
"""


afile = r"C:\Users\hansc\Downloads\difffindertest\test1_1.txt"
bfile = r"C:\Users\hansc\Downloads\difffindertest\test1_2.txt"

df = get_file_diff(
    afile, bfile, window_shifts=300, min_fuzz_match=99, fuzz_scorer=fuzz.WRatio, cpus=5
)
print(df)

Project details


Release history Release notifications | RSS feed

This version

0.10

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textcompari-0.10.tar.gz (28.9 kB view details)

Uploaded Source

Built Distribution

textcompari-0.10-py3-none-any.whl (29.2 kB view details)

Uploaded Python 3

File details

Details for the file textcompari-0.10.tar.gz.

File metadata

  • Download URL: textcompari-0.10.tar.gz
  • Upload date:
  • Size: 28.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for textcompari-0.10.tar.gz
Algorithm Hash digest
SHA256 a641c7563f8ab0d7755cbdf9ae66d2034a9cc2b5a14ad2cb4b52e815c0a30fc5
MD5 618c500c1daa0dbcad67c70f1643d967
BLAKE2b-256 3905ebe2fa07f6e4d7c39330c7c9a91c6a91f7e3b981343860fd17ce89acbaa1

See more details on using hashes here.

File details

Details for the file textcompari-0.10-py3-none-any.whl.

File metadata

  • Download URL: textcompari-0.10-py3-none-any.whl
  • Upload date:
  • Size: 29.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for textcompari-0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 316d254e9fc8e4e7ce2a869089f76fc3f7ab7a90712d7b9cc787e61caf8b282c
MD5 d5b7b5c1bad44ba1fc7be94543fc8bbd
BLAKE2b-256 ba391318f7265e2d63a51da33dd4b19abd04522bd22ce503ab8daeafdc8763f2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page