Unified diff parsing/metadata extraction library.
Project description
Simple Python library to parse and interact with unified diff data.
Installing unidiff
$ pip install unidiff
Quick start
>>> import urllib.request >>> from unidiff import PatchSet >>> diff = urllib.request.urlopen('https://github.com/matiasb/python-unidiff/pull/3.diff') >>> encoding = diff.headers.get_charsets()[0] >>> patch = PatchSet(diff, encoding=encoding) >>> patch <PatchSet: [<PatchedFile: .gitignore>, <PatchedFile: unidiff/patch.py>, <PatchedFile: unidiff/utils.py>]> >>> patch[0] <PatchedFile: .gitignore> >>> patch[0].is_added_file True >>> patch[0].added 6 >>> patch[1] <PatchedFile: unidiff/patch.py> >>> patch[1].added, patch[1].removed (20, 11) >>> len(patch[1]) 6 >>> patch[1][2] <Hunk: @@ 109,14 110,21 @@ def __repr__(self):> >>> patch[2] <PatchedFile: unidiff/utils.py> >>> print(patch[2]) diff --git a/unidiff/utils.py b/unidiff/utils.py index eae63e6..29c896a 100644 --- a/unidiff/utils.py +++ b/unidiff/utils.py @@ -37,4 +37,3 @@ # - deleted line # \ No newline case (ignore) RE_HUNK_BODY_LINE = re.compile(r'^([- \+\\])') -
Load unified diff data by instantiating PatchSet
with a file-like object as
argument, or using PatchSet.from_filename
class method to read diff from file.
A PatchSet
is a list of files updated by the given patch. For each PatchedFile
you can get stats (if it is a new, removed or modified file; the source/target
lines; etc), besides having access to each hunk (also like a list) and its
respective info.
At any point you can get the string representation of the current object, and that will return the unified diff data of it.
As a quick example of what can be done, check bin/unidiff file.
Also, once installed, unidiff provides a command-line program that displays information from diff data (a file, or stdin). For example:
$ git diff | unidiff Summary ------- README.md: +6 additions, -0 deletions 1 modified file(s), 0 added file(s), 0 removed file(s) Total: 6 addition(s), 0 deletion(s)
Load a local diff file
To instantiate PatchSet
from a local file, you can use:
>>> from unidiff import PatchSet >>> patch = PatchSet.from_filename('tests/samples/bzr.diff', encoding='utf-8') >>> patch <PatchSet: [<PatchedFile: added_file>, <PatchedFile: modified_file>, <PatchedFile: removed_file>]>
Notice the (optional) encoding
parameter. If not specified, unicode input will be expected. Or alternatively:
>>> import codecs >>> from unidiff import PatchSet >>> with codecs.open('tests/samples/bzr.diff', 'r', encoding='utf-8') as diff: ... patch = PatchSet(diff) ... >>> patch <PatchSet: [<PatchedFile: added_file>, <PatchedFile: modified_file>, <PatchedFile: removed_file>]>
Finally, you can also instantiate PatchSet
passing any iterable (and encoding, if needed):
>>> from unidiff import PatchSet >>> with open('tests/samples/bzr.diff', 'r') as diff: ... data = diff.readlines() ... >>> patch = PatchSet(data) >>> patch <PatchSet: [<PatchedFile: added_file>, <PatchedFile: modified_file>, <PatchedFile: removed_file>]>
If you don’t need to be able to rebuild the original unified diff input, you can pass
metadata_only=True
(defaults to False
), which should help making the
parsing more efficient:
>>> from unidiff import PatchSet >>> patch = PatchSet.from_filename('tests/samples/bzr.diff', encoding='utf-8', metadata_only=True)
References
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for unidiff-0.7.5-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c93bf2265cc1ba2a520e415ab05da587370bc2a3ae9e0414329f54f0c2fc09e8 |
|
MD5 | 84d8a496f5ea3c957c38f1d82934945f |
|
BLAKE2b-256 | 8a5457c411a6e8f7bd7848c8b66e4dcaffa586bf4c02e63f2280db0327a4e6eb |