Skip to main content

CaboCha output-XML accessor

Project description

Parse XMLs from CaboCha and provides simple tree accessors.

Usage

Expected usages are focused on chunk surfaces and dependency links:

>>> aisansan = xmlpumpkin.parse_to_tree(
...     u'愛燦々とこの身に降って心密かなうれしい涙を流したりして'
... )
>>> len(aisansan.chunks)
8
>>> print(aisansan.root.surface)
流したりして
>>> print(aisansan.root.func_surface)
て
>>> for dep in aisansan.root.linked:
...     print(dep.surface)
...
降って
涙を

You need CaboCha in your path, or shortly with prepared XML:

>>> tree = xmlpumpkin.Tree(xml_as_unicode)

Should you need an easy interface from Python to CaboCha:

>>> from xmlpumpkin import cabocha
>>> print(cabocha.txttree(
...     u'愛燦々とこの身に降って心密かなうれしい涙を流したりして'
... ))
    愛燦々と-----D
          この-D |
            身に-D
            降って-------D
            心密かな---D |
              うれしい-D |
                    涙を-D
              流したりして
EOS
>>> print(cabocha.as_xml(
...     u'愛燦々とこの身に降って心密かなうれしい涙を流したりして'
... ))
<sentence>
  ...
</sentence>

All I/Os are unicodes! If encodings other than UTF-8 is preferred, directly modify following constants:

>>> import xmlpumpkin.runner
>>> xmlpumpkin.runner.CABOCHA_ENCODING = 'SJIS'
>>>
>>> import xmlpumpkin.tree
>>> xmlpumpkin.tree.XML_ENCODING = 'SJIS'

Properties

Not enough but a few properties are provided via Tree and Chunk objects.

class xmlpumpkin.Tree(cabocha_xml)
  • chunks - tuple of chunks
  • root - root (not depending on any chunks) Chunk object
  • chunk_by_id(chunk_id) - get Chunk object by its id generated by CaboCha
  • _element - origin XML as lxml Element object
class xmlpumpkin.Chunk(element, parent)
  • id - chunk id
  • link_to_id - its depending chunk id
  • linked_from_ids - tuple of chunk id depending to this chunk
  • func_id - functional token id of this chunk
  • dep - its depending Chunk object
  • linked - list of all Chunk objects depending to this chunk
  • surface - surface of this chunk
  • func_surface - surface of this chunk’s functional token
  • _tokens() - its containing tokens as lxml Element objects

Project details


Release history Release notifications

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for xmlpumpkin, version 0.1
Filename, size & hash File type Python version Upload date
xmlpumpkin-0.1.tar.gz (7.1 kB) View hashes Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page