Skip to main content
Join the official 2020 Python Developers SurveyStart the survey!

CaboCha output-XML accessor

Project description

Parse XMLs from CaboCha and provides simple tree accessors.

Usage

Expected usages are focused on chunk surfaces and dependency links:

>>> aisansan = xmlpumpkin.parse_to_tree(
...     u'愛燦々とこの身に降って心密かなうれしい涙を流したりして'
... )
>>> len(aisansan.chunks)
8
>>> print(aisansan.root.surface)
流したりして
>>> print(aisansan.root.func_surface)
て
>>> for dep in aisansan.root.linked:
...     print(dep.surface)
...
降って
涙を

You need CaboCha in your path, or shortly with prepared XML:

>>> tree = xmlpumpkin.Tree(xml_as_unicode)

Should you need an easy interface from Python to CaboCha:

>>> from xmlpumpkin import cabocha
>>> print(cabocha.txttree(
...     u'愛燦々とこの身に降って心密かなうれしい涙を流したりして'
... ))
    愛燦々と-----D
          この-D |
            身に-D
            降って-------D
            心密かな---D |
              うれしい-D |
                    涙を-D
              流したりして
EOS
>>> print(cabocha.as_xml(
...     u'愛燦々とこの身に降って心密かなうれしい涙を流したりして'
... ))
<sentence>
  ...
</sentence>

All I/Os are unicodes! If encodings other than UTF-8 is preferred, directly modify following constants:

>>> import xmlpumpkin.runner
>>> xmlpumpkin.runner.CABOCHA_ENCODING = 'SJIS'
>>>
>>> import xmlpumpkin.tree
>>> xmlpumpkin.tree.XML_ENCODING = 'SJIS'

Properties

Not enough but a few properties are provided via Tree and Chunk objects.

class xmlpumpkin.Tree(cabocha_xml)
  • chunks - tuple of chunks
  • root - root (not depending on any chunks) Chunk object
  • chunk_by_id(chunk_id) - get Chunk object by its id generated by CaboCha
  • _element - origin XML as lxml Element object
class xmlpumpkin.Chunk(element, parent)
  • id - chunk id
  • link_to_id - its depending chunk id
  • linked_from_ids - tuple of chunk id depending to this chunk
  • func_id - functional token id of this chunk
  • dep - its depending Chunk object
  • linked - list of all Chunk objects depending to this chunk
  • surface - surface of this chunk
  • func_surface - surface of this chunk’s functional token
  • _tokens() - its containing tokens as lxml Element objects

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for xmlpumpkin, version 0.1
Filename, size File type Python version Upload date Hashes
Filename, size xmlpumpkin-0.1.tar.gz (7.1 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page