CaboCha output-XML accessor
Project description
Parse XMLs from CaboCha and provides simple tree accessors.
Usage
Expected usages are focused on chunk surfaces and dependency links:
>>> aisansan = xmlpumpkin.parse_to_tree( ... u'愛燦々とこの身に降って心密かなうれしい涙を流したりして' ... ) >>> len(aisansan.chunks) 8 >>> print(aisansan.root.surface) 流したりして >>> print(aisansan.root.func_surface) て >>> for dep in aisansan.root.linked: ... print(dep.surface) ... 降って 涙を
You need CaboCha in your path, or shortly with prepared XML:
>>> tree = xmlpumpkin.Tree(xml_as_unicode)
Should you need an easy interface from Python to CaboCha:
>>> from xmlpumpkin import cabocha >>> print(cabocha.txttree( ... u'愛燦々とこの身に降って心密かなうれしい涙を流したりして' ... )) 愛燦々と-----D この-D | 身に-D 降って-------D 心密かな---D | うれしい-D | 涙を-D 流したりして EOS >>> print(cabocha.as_xml( ... u'愛燦々とこの身に降って心密かなうれしい涙を流したりして' ... )) <sentence> ... </sentence>
All I/Os are unicodes! If encodings other than UTF-8 is preferred, directly modify following constants:
>>> import xmlpumpkin.runner >>> xmlpumpkin.runner.CABOCHA_ENCODING = 'SJIS' >>> >>> import xmlpumpkin.tree >>> xmlpumpkin.tree.XML_ENCODING = 'SJIS'
Properties
Not enough but a few properties are provided via Tree and Chunk objects.
- class xmlpumpkin.Tree(cabocha_xml)
chunks - tuple of chunks
root - root (not depending on any chunks) Chunk object
chunk_by_id(chunk_id) - get Chunk object by its id generated by CaboCha
_element - origin XML as lxml Element object
- class xmlpumpkin.Chunk(element, parent)
id - chunk id
link_to_id - its depending chunk id
linked_from_ids - tuple of chunk id depending to this chunk
func_id - functional token id of this chunk
dep - its depending Chunk object
linked - list of all Chunk objects depending to this chunk
surface - surface of this chunk
func_surface - surface of this chunk’s functional token
_tokens() - its containing tokens as lxml Element objects
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.