This python module provides a wrapper around treetagger.
Project description
jk_treetaggerwrapper
====================
Introduction
------------
This python module provides a wrapper around treetagger. Currently this module makes use of module `treetaggerwrapper` but this depency will be changed in the future.
Information about this module can be found here:
* [github.org](https://github.com/jkpubsrc/python-module-jk-treetaggerwrapper)
* [pypi.python.org](https://pypi.python.org/pypi/jk_treetaggerwrapper)
How to use this module
----------------------
Example:
```python
pool = PoolOfThreadedTreeTaggers("/path/to/treetagger")
result = pool.tagText2("en", "The sun is shining and the children are smiling.")
```
In order to tag a text you first need to instantiate a pool of taggers. Then you can invoke `tagText2()` in order to temporarily allocate an instance of `TreeTagger` in the background and perform the PoS tagging.
NOTE: Invoking `tagText()` is discouraged as it has been replaced with a better implementation. Nevertheless it is still available for compatibility reasons.
Four arguments can be specified:
* langID : A string that contains the ID of the language of the text to tag.
* text : The text to tag.
* bWithConfidence : A boolean value that indicates whether to return the result together with confidence value or without.
* bWithNullsInsteadOfUnknown : A boolean value that indicates whether or not to convert ">unknown<" to a null-value.
The result is always a list with tuples. Each tuple has the following struture:
* The token itself.
* The assigned tag.
* The lemma.
* The confidence value.
The group consisting of tag-lemma-confidence can be returned multiple times. For example:
* The token itself.
* The assigned tag 1.
* The lemma 1.
* The confidence value 1.
* The assigned tag 2 (as an alternative).
* The lemma 2 (as an alternative).
* The confidence value 2 (as an alternative).
Concurrency
-----------
Please note that this library is based on `treetaggerwrapper` which follows a thread-based concurrency model. On tagging `treetaggerwrapper` instantiates a TreeTagger background process that is alive for the duration of the `treetaggerwrapper` object. This `treetaggerwrapper` object then communicates with this background process and uses threads for this purpose. Therefor the class `PoolOfThreadedTreeTaggers` provided by `jk_treetaggerwrapper` is bound to this limitation.
Contact Information
-------------------
This is Open Source code. That not only gives you the possibility of freely using this code it also
allows you to contribute. Feel free to contact the author(s) of this software listed below, either
for comments, collaboration requests, suggestions for improvement or reporting bugs:
* Jürgen Knauth: jknauth@uni-goettingen.de, pubsrc@binary-overflow.de
License
-------
This software is provided under the following license:
* Apache Software License 2.0
====================
Introduction
------------
This python module provides a wrapper around treetagger. Currently this module makes use of module `treetaggerwrapper` but this depency will be changed in the future.
Information about this module can be found here:
* [github.org](https://github.com/jkpubsrc/python-module-jk-treetaggerwrapper)
* [pypi.python.org](https://pypi.python.org/pypi/jk_treetaggerwrapper)
How to use this module
----------------------
Example:
```python
pool = PoolOfThreadedTreeTaggers("/path/to/treetagger")
result = pool.tagText2("en", "The sun is shining and the children are smiling.")
```
In order to tag a text you first need to instantiate a pool of taggers. Then you can invoke `tagText2()` in order to temporarily allocate an instance of `TreeTagger` in the background and perform the PoS tagging.
NOTE: Invoking `tagText()` is discouraged as it has been replaced with a better implementation. Nevertheless it is still available for compatibility reasons.
Four arguments can be specified:
* langID : A string that contains the ID of the language of the text to tag.
* text : The text to tag.
* bWithConfidence : A boolean value that indicates whether to return the result together with confidence value or without.
* bWithNullsInsteadOfUnknown : A boolean value that indicates whether or not to convert ">unknown<" to a null-value.
The result is always a list with tuples. Each tuple has the following struture:
* The token itself.
* The assigned tag.
* The lemma.
* The confidence value.
The group consisting of tag-lemma-confidence can be returned multiple times. For example:
* The token itself.
* The assigned tag 1.
* The lemma 1.
* The confidence value 1.
* The assigned tag 2 (as an alternative).
* The lemma 2 (as an alternative).
* The confidence value 2 (as an alternative).
Concurrency
-----------
Please note that this library is based on `treetaggerwrapper` which follows a thread-based concurrency model. On tagging `treetaggerwrapper` instantiates a TreeTagger background process that is alive for the duration of the `treetaggerwrapper` object. This `treetaggerwrapper` object then communicates with this background process and uses threads for this purpose. Therefor the class `PoolOfThreadedTreeTaggers` provided by `jk_treetaggerwrapper` is bound to this limitation.
Contact Information
-------------------
This is Open Source code. That not only gives you the possibility of freely using this code it also
allows you to contribute. Feel free to contact the author(s) of this software listed below, either
for comments, collaboration requests, suggestions for improvement or reporting bugs:
* Jürgen Knauth: jknauth@uni-goettingen.de, pubsrc@binary-overflow.de
License
-------
This software is provided under the following license:
* Apache Software License 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file jk_treetaggerwrapper-0.2018.12.28.tar.gz
.
File metadata
- Download URL: jk_treetaggerwrapper-0.2018.12.28.tar.gz
- Upload date:
- Size: 7.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: Python-urllib/3.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 50e46edee48f133c8cdb767b2f3c566d1a864f61c623e286f3e9ef568456c159 |
|
MD5 | 6782450156cf28b3bf6ed1e320a0cd9c |
|
BLAKE2b-256 | 831ff3c5400493284d97489095c21b00b7c6879af60fadcec8d6346a7a04476c |