Skip to main content

A python implementation of the GCL region algebra and query language described by Clarke et al.

Project description

pyra - Python Region Algebra

Pyra is a python implementation of the region query algebra described in [1]. Region algebras are used to efficiently query semi-structured text documents. For a quick online introduction to this region algebra, and why it is useful, visit the [Wumpus Search Docs](http://www.wumpus-search.org/docs/gcl.html). In general, region algebras are good for extracting data from documents that have lightweight structure (semi-structured), and are an alternative to more heavyweight solutions like XPath queries.

# Setup the corpus corpus = “the quick brown fox jumps over the lazy dog and the brown dog runs away” tokens = corpus.split()

# List regions starting with ‘brown’ and ending with ‘dog’, containing # the phrase ‘fox jumps over’.

iidx = InvertedIndex(tokens) g = GCL(iidx)

for s in g.Contains( g.BoundedBy( g.Term(‘brown’), g.Term(‘dog’) ), g.Phrase(‘fox’, ‘jumps’, ‘over’) ):
print s print “’%s’” % (tokens[s],)

The above prints:

slice(2,9)

‘brown fox jumps over the lazy dog’

References

[1] Clarke, C. L., Cormack, G. V., & Burkowski, F. J. (1995). An algebra for structured text search
and a framework for its implementation. The Computer Journal, 38(1), 43-56. Chicago

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
pyra-0.2.3dev.tar.gz (6.7 kB) Copy SHA256 hash SHA256 Source None Feb 10, 2014

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page