Compares ordered lists, xml and csv application
Project description
Detailed Documentation
XML and CSV comparisons
Two scripts are provided xml_cmp and csv_cmp They both compares 2 files and outputs delta as file_suppr, file_addon and file_changes
the extension is forced to xml or csv respectively
List comparison
listcomparator provides a Comparator object that allows to find the differences between two lists provided the elements of the lists appear in the same order
>>> old = [1, 2, 3, 4, 5, 6] >>> new = [1, 3, 4, 7, 6]
>>> from listcomparator.comparator import Comparator
Let’s create a Comparator object
>>> comp = Comparator(old,new)
The check method gives values to additions and deletions attributes
>>> comp.check() >>> comp.additions [7] >>> comp.deletions [2, 5]
We can also use lists of lists
>>> old_list = [['62145', 'azerty'], ['1234', 'qwerty'], ['9876', 'ipsum']] >>> new_list = [['62145', 'azerty'], ['1234', 'qwertw'], ['4865', 'lorem']] >>> comp = Comparator(old_list, new_list) >>> comp.check() >>> comp.additions [['1234', 'qwertw'], ['4865', 'lorem']] >>> comp.deletions [['1234', 'qwerty'], ['9876', 'ipsum']]
We can have an issue when a modification, in our case “qwerty” became “qwertz”, appears in both outputs, comp.additions and comp.deletions. You might want to consider this a change. Comparator can handle this and filter out such cases if you provide a function that tells Comparator how to recognize such cases In our example, we consider 2 elements to be the same if the first element of the list is the same, a kind of id.
>>> def my_key(x): ... return x[0] ...
The getChanges methods then provides a new attribute : changes
>>> comp.getChanges(my_key) >>> comp.changes [['1234', 'qwertw']]
of course, additions and deletions stay unchanged
>>> comp.additions [['1234', 'qwertw'], ['4865', 'lorem']] >>> comp.deletions [['1234', 'qwerty'], ['9876', 'ipsum']]
You might want to consider only ‘pure’ additions and deletions getChanges allows for a keyword argument ‘purge’ that does just that
>>> comp.getChanges(my_key, purge=True) >>> comp.changes [['1234', 'qwertw']] >>> comp.additions [['4865', 'lorem']] >>> comp.deletions [['9876', 'ipsum']]
The old and new attributes store the lists to be compared you might want to reset those, Comparator provides a purgeOldNew method to clear up memory
>>> comp.old [['62145', 'azerty'], ['1234', 'qwerty'], ['9876', 'ipsum']] >>> comp.new [['62145', 'azerty'], ['1234', 'qwertw'], ['4865', 'lorem']] >>> comp.purgeOldNew() >>> comp.old >>> comp.new
compare XML files
Comparator can be used to compare xml files let’s make two xml files describing books
>>> old='''<?xml version="1.0" ?> ... <infos> ... <book><title>White pages 1995</title> ... <author> ... <surname>La Poste</surname> ... </author> ... <chapter><title>Paris</title> ... <para>ABEL Antoine 82 23 44 12</para> ... <para>ABEL Pierre 82 67 23 12</para> ... </chapter> ... </book> ... <book><title>Yellow pages 2007</title> ... <author> ... <surname>La Poste</surname> ... </author> ... <chapter><title>Bretagne</title> ... <para>Zindep 82 23 44 12</para> ... <para>ZYM 82 67 23 12</para> ... </chapter> ... </book> ... <book><title>Dark pages 2007</title> ... <author> ... <surname>La Poste</surname> ... </author> ... <chapter><title>Greves</title> ... <para>SNCF 82 23 44 12</para> ... </chapter> ... </book> ... </infos> ... '''
>>> new='''<?xml version="1.0"?> ... <infos> ... <book><title>White pages 1995</title> ... <author> ... <surname>La Poste</surname> ... </author> ... <chapter><title>Paris</title> ... <para>ABIL Antoine 82 23 44 12</para> ... <para>ABEL Pierre 82 67 23 12</para> ... </chapter> ... </book> ... <book><title>Yellow pages 2007</title> ... <author> ... <surname>La Poste</surname> ... </author> ... <chapter><title>Bretagne</title> ... <para>Zindep 82 23 44 12</para> ... <para>ZYM 82 67 23 12</para> ... </chapter> ... </book> ... <book><title>Blue pages 2007</title> ... <author> ... <surname>La Poste</surname> ... </author> ... <chapter><title>Bretagne</title> ... <para>Mer 82 23 44 12</para> ... <para>Ciel 82 67 23 12</para> ... </chapter> ... </book> ... </infos> ... '''
elementtree is required to parse xml
>>> from elementtree import ElementTree as ET
for this test we’ll use cStringIO rather than a file
>>> import cStringIO >>> ex_old = cStringIO.StringIO(old) >>> ex_new = cStringIO.StringIO(new)
we parse contents
>>> root_old = ET.parse(ex_old).getroot() >>> root_new = ET.parse(ex_new).getroot()
the “book” tag identifies objects we want >>> objects_old = root_old.findall(‘book’) >>> objects_new = root_new.findall(‘book’)
as we can’t compare 2 objects, we stringify them
>>> objects_old = [ET.tostring(o) for o in objects_old] >>> objects_new = [ET.tostring(o) for o in objects_new]
from there, Comparator is usefull
>>> my_comp = Comparator(objects_old, objects_new) >>> my_comp.check()
>>> for e in my_comp.additions: ... print e ... <book><title>White pages 1995</title> <author> <surname>La Poste</surname> </author> <chapter><title>Paris</title> <para>ABIL Antoine 82 23 44 12</para> <para>ABEL Pierre 82 67 23 12</para> </chapter> </book> <BLANKLINE> <book><title>Blue pages 2007</title> <author> <surname>La Poste</surname> </author> <chapter><title>Bretagne</title> <para>Mer 82 23 44 12</para> <para>Ciel 82 67 23 12</para> </chapter> </book> <BLANKLINE>
>>> for e in my_comp.deletions: ... print e ... <book><title>White pages 1995</title> <author> <surname>La Poste</surname> </author> <chapter><title>Paris</title> <para>ABEL Antoine 82 23 44 12</para> <para>ABEL Pierre 82 67 23 12</para> </chapter> </book> <BLANKLINE> <book><title>Dark pages 2007</title> <author> <surname>La Poste</surname> </author> <chapter><title>Greves</title> <para>SNCF 82 23 44 12</para> </chapter> </book> <BLANKLINE>
we need to know wich tag is used to uniquely define an object here we choose to use the “title” tag
>>> def item_signature(xml_element): ... title = xml_element.find('title') ... return title.text ...
we build our custom function for use by the Comparator
>>> def my_key(str): ... file_like = cStringIO.StringIO(str) ... root = ET.parse(file_like) ... return item_signature(root) ...
then the getChanges method of the Comparator becomes available
>>> my_comp.getChanges(my_key, purge=True)
What books have been exclusively added ?
>>> for e in my_comp.additions: ... print e ... <book><title>Blue pages 2007</title> <author> <surname>La Poste</surname> </author> <chapter><title>Bretagne</title> <para>Mer 82 23 44 12</para> <para>Ciel 82 67 23 12</para> </chapter> </book> <BLANKLINE>
what books have been exclusively removed ?
>>> for e in my_comp.deletions: ... print e ... <book><title>Dark pages 2007</title> <author> <surname>La Poste</surname> </author> <chapter><title>Greves</title> <para>SNCF 82 23 44 12</para> </chapter> </book> <BLANKLINE>
what books have changed ? that is have same title, but different other values
>>> for e in my_comp.changes: ... print e ... <book><title>White pages 1995</title> <author> <surname>La Poste</surname> </author> <chapter><title>Paris</title> <para>ABIL Antoine 82 23 44 12</para> <para>ABEL Pierre 82 67 23 12</para> </chapter> </book> <BLANKLINE>
then we can put those results back in xml file
This code conforms to PEP8
It is fully tested, 100% coverage
A Buildbot runs tests at each commit
Contributors
Main developpers
Nicolas Laurance <nlaurance at zindep dot com>
with contributions of
Yves Mahe <ymahe at zindep dot com>
Change history
New in 0.1
First Release
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file ListComparator-0.1.tar.gz
.
File metadata
- Download URL: ListComparator-0.1.tar.gz
- Upload date:
- Size: 10.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
1b17dad959d0963261a8e8e08c06018329d413461d0f0facd77fb206b66074fb
|
|
MD5 |
e8b3b57781101ab6eeeda7e9fb807e27
|
|
BLAKE2b-256 |
6ec629c3bbc181c6b24dcfb8b46a76d66bf2c83f3802f727de727bbb41841688
|