Skip to main content

Python interface to camxes.

Project description

To install, you need a Java runtime environment as a java command on your $PATH, Python 2.6+ (including 3.1) and python-setuptools (or distribute). Then you can simply install this package from PyPI with easy_install or pip, or as a dependency in your own setup.py. The parser itself is bundled with this package so you don’t need to worry about that.

easy_install camxes

Parsing Lojban

The parse() function returns a parse tree of named nodes.

>>> import camxes
>>> print camxes.parse("coi rodo")
text
 `- free
     +- CMAVO
     |   `- COI
     |       `- u'coi'
     `- sumti5
         +- CMAVO
         |   `- PA
         |       `- u'ro'
         `- CMAVO
             `- KOhA
                 `- u'do'

Turn a tree back into Lojban with the lojban property.

>>> camxes.parse("coi rodo!").lojban
u'coi ro do'

This joins the leaf nodes with a space, but you can preserve spaces and punctuation by passing spaces=True to parse().

>>> camxes.parse("coi rodo!", spaces=True).lojban
u'coi rodo!'

Child nodes can be accessed by name as attributes, giving a list of such nodes. If there are no child nodes with that name an exception is raised.

>>> print camxes.parse("coi rodo").free[0].sumti5[0].CMAVO[1]
CMAVO
 `- KOhA
     `- u'do'

You can also access nodes by sequential position without giving the name.

>>> print camxes.parse("coi rodo")[0][1]
sumti5
 +- CMAVO
 |   `- PA
 |       `- u'ro'
 `- CMAVO
     `- KOhA
         `- u'do'

Nodes iterate over their children.

>>> list(camxes.parse("coi rodo")[0][1])
[<CMAVO {ro}>, <CMAVO {do}>]

They also know their name.

>>> camxes.parse("coi rodo")[0][1].name
u'sumti5'

Verifying grammatical validity

parse() is able to parse some ungrammatical input by processing as much as is grammatical. It is therefore unreliable for checking if some text is grammatical. For this purpose, there is the isgrammatical() predicate.

>>> camxes.isgrammatical("coi rodo")
True
>>> camxes.isgrammatical("mupli cu fliba")
False
>>> print camxes.parse("mupli cu fliba")
text
 `- BRIVLA
     `- gismu
         `- u'mupli'

Deconstructing compound words into affixes

decompose() gives you the affixes and hyphens of a compound.

>>> camxes.decompose("genturfa'i")
(u'gen', u'tur', u"fa'i")

It will complain for input that is not a single, valid compound.

>>> camxes.decompose("camxes")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid compound 'camxes'

Parsing only morphology

The morphology() function works much like parse().

>>> print camxes.morphology("coi")
text
 `- CMAVO
     `- COI
         +- c
         |   `- u'c'
         +- o
         |   `- u'o'
         `- i
             `- u'i'

Tree traversal

Search for nodes with the find() method. It takes any number of arguments that are wildcard-matched against node names. This operation recurses down each branch until a match is found, but does not search children of matching nodes.

>>> camxes.parse("coi rodo").find('sumti*')
(<sumti5 {ro do}>,)
>>> camxes.parse("coi rodo").find('PA', 'KOhA')
(<PA {ro}>, <KOhA {do}>)

Key access on nodes is a shortcut for the first match of a find.

>>> camxes.parse("la camxes genturfa'i fi la lojban")['cmene']
<cmene {camxes}>

The leafs property is a tuple of all leaf nodes, which should be the unicode lexemes.

>>> camxes.parse("coi rodo").leafs
(u'coi', u'ro', u'do')

The branches() method finds the parents of nodes whose leafs match the arguments. This lets you search for the branches a sequence of lexemes belong to.

>>> camxes.parse("lo ninmu cu klama lo tcadu").branches("lo")
(<sumti6 {lo ninmu}>, <sumti6 {lo tcadu}>)
>>> camxes.parse("lo ninmu cu klama lo tcadu").branches("ninmu")
(<sumti6 {lo ninmu}>,)
>>> camxes.parse("lo ninmu cu klama lo tcadu").branches("klama", "lo", "tcadu")
(<sentence {lo ninmu cu klama lo tcadu}>,)

A generalization of these is called filter() and takes a predicate function that decides if a node should be listed. filter() is a generator so we use list() here to see the results.

>>> leafparent = lambda node: not isinstance(node[0], camxes.Node)
>>> list(camxes.parse("coi rodo").filter(leafparent))
[<COI {coi}>, <PA {ro}>, <KOhA {do}>]

Tree transformation

You can transform a node, recursively, into a tuple of strings, where the first item is the name of the node and the rest are the child nodes. This property is called primitive and can be useful if you’re serializing a parse tree to a more “dumb” format such as JSON.

>>> from pprint import pprint
>>> pprint(camxes.parse("coi rodo").primitive)
(u'text',
 (u'free',
  (u'CMAVO', (u'COI', u'coi')),
  (u'sumti5', (u'CMAVO', (u'PA', u'ro')), (u'CMAVO', (u'KOhA', u'do')))))
>>> import json
>>> print json.dumps(camxes.parse("coi").primitive, indent=2)
[
  "text",
  [
    "CMAVO",
    [
      "COI",
      "coi"
    ]
  ]
]

The generalization of primitive is called map() and takes a transformer function that in turn takes a node. The transformation is then mapped recursively on all nodes and a nested tuple, similar to that of primitive, is returned.

>>> camxes.parse("coi rodo").map(len)
(1, (2, (1, (1, 3)), (2, (1, (1, 2)), (1, (1, 2)))))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

camxes-0.2.tar.gz (826.2 kB view details)

Uploaded Source

File details

Details for the file camxes-0.2.tar.gz.

File metadata

  • Download URL: camxes-0.2.tar.gz
  • Upload date:
  • Size: 826.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for camxes-0.2.tar.gz
Algorithm Hash digest
SHA256 f056027fdb3f8aa942dd49ada0b33a2ea6bf285c16ae7e7d7c073f60142012e6
MD5 15173336e9e6048fe72aec540c7744db
BLAKE2b-256 6a6c0cfdb2632b82146ac2c44bf4ba11110109e2a8a3d75f8152ca51205fd070

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page