Skip to main content

This python module provides a wrapper around treetagger.

Project description

jk_treetaggerwrapper
====================

Introduction
------------

This python module provides a wrapper around treetagger. Currently this module makes use of module `treetaggerwrapper` but this depency will be changed in the future.

Information about this module can be found here:

* [github.org](https://github.com/jkpubsrc/python-module-jk-treetaggerwrapper)
* [pypi.python.org](https://pypi.python.org/pypi/jk_treetaggerwrapper)

How to use this module
----------------------

Example:

```python
pool = PoolOfThreadedTreeTaggers("/path/to/treetagger")

result = pool.tagText2("en", "The sun is shining and the children are smiling.")
```

In order to tag a text you first need to instantiate a pool of taggers. Then you can invoke `tagText2()` in order to temporarily allocate an instance of `TreeTagger` in the background and perform the PoS tagging.

NOTE: Invoking `tagText()` is discouraged as it has been replaced with a better implementation. Nevertheless it is still available for compatibility reasons.

Four arguments can be specified:

* langID : A string that contains the ID of the language of the text to tag.
* text : The text to tag.
* bWithConfidence : A boolean value that indicates whether to return the result together with confidence value or without.
* bWithNullsInsteadOfUnknown : A boolean value that indicates whether or not to convert ">unknown<" to a null-value.

The result is always a list with tuples. Each tuple has the following struture:

* The token itself.
* The assigned tag.
* The lemma.
* The confidence value.

The group consisting of tag-lemma-confidence can be returned multiple times. For example:

* The token itself.
* The assigned tag 1.
* The lemma 1.
* The confidence value 1.
* The assigned tag 2 (as an alternative).
* The lemma 2 (as an alternative).
* The confidence value 2 (as an alternative).

Concurrency
-----------

Please note that this library is based on `treetaggerwrapper` which follows a thread-based concurrency model. On tagging `treetaggerwrapper` instantiates a TreeTagger background process that is alive for the duration of the `treetaggerwrapper` object. This `treetaggerwrapper` object then communicates with this background process and uses threads for this purpose. Therefor the class `PoolOfThreadedTreeTaggers` provided by `jk_treetaggerwrapper` is bound to this limitation.

Contact Information
-------------------

This is Open Source code. That not only gives you the possibility of freely using this code it also
allows you to contribute. Feel free to contact the author(s) of this software listed below, either
for comments, collaboration requests, suggestions for improvement or reporting bugs:

* Jürgen Knauth: jknauth@uni-goettingen.de, pubsrc@binary-overflow.de

License
-------

This software is provided under the following license:

* Apache Software License 2.0



Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jk_treetaggerwrapper-0.2018.12.28.tar.gz (7.9 kB view details)

Uploaded Source

File details

Details for the file jk_treetaggerwrapper-0.2018.12.28.tar.gz.

File metadata

File hashes

Hashes for jk_treetaggerwrapper-0.2018.12.28.tar.gz
Algorithm Hash digest
SHA256 50e46edee48f133c8cdb767b2f3c566d1a864f61c623e286f3e9ef568456c159
MD5 6782450156cf28b3bf6ed1e320a0cd9c
BLAKE2b-256 831ff3c5400493284d97489095c21b00b7c6879af60fadcec8d6346a7a04476c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page