A multi-purpose sequential tagger wrapped around CRFSuite

Project Description

_CRFSuiteTagger_ is a sequence tagger based on the [pycrfsuite]( "pycrfsuite") python wrapper for [CRFSuite]( "CRFSuite"). It is built for chunking, NER, and other BIO (also referred to as IOB) based text annotation tasks.

### Why would you need this?

_CRFSuiteTagger_ has a wide selection of common features, and the capability to easily integrate additional ones. The features are controlled using a simple string-based feature template. Additional features can be easily added through new _feature generating functions_ (see `crfsuitetagger.ftex`) passed on the `CRFSuiteTagger` constructor.

### Installation

You should be able to install _CRFSuiteTagger_ as any other Python package:

python install

### Dependencies

You will need the following Python packages and one of my other libraries:

* [pycrfsuite]( "pycrfsuite") - python wrapper for CRFSuite
* [numpy]( "NumPy") - you should it
* [bioeval]( "bioeval") - my library for evaluating BIO style annotation, which replaces the perl script from [CoNLL-2000](

### TODO

* command line interface
* migrate data structure to [pandas]( "pandas")
* more examples

### See Also

If you are interested in other sequence taggers, you might want to look at:

* [Stanford NLP]( -- POS tagger
* [ARK]( -- POS tagger for tweets
* [YamCha]( -- BIO tagger/chunker
* [CRF++]( -- BIO tagger/chunker
* [Wapiti]( -- POS & BIO tagger/chunker
Release History

