Skip to main content

A multi-purpose sequential tagger wrapped around CRFSuite

Project description

CRFSuiteTagger
==============

_CRFSuiteTagger_ is a sequence tagger based on the [pycrfsuite](https://github.com/tpeng/python-crfsuite "pycrfsuite") python wrapper for [CRFSuite](http://www.chokkan.org/software/crfsuite/ "CRFSuite"). It is built for chunking, NER, and other BIO (also referred to as IOB) based text annotation tasks.

### Why would you need this?

_CRFSuiteTagger_ has a wide selection of common features, and the capability to easily integrate additional ones. The features are controlled using a simple string-based feature template. Additional features can be easily added through new _feature generating functions_ (see `crfsuitetagger.ftex`) passed on the `CRFSuiteTagger` constructor.

### Installation

You should be able to install _CRFSuiteTagger_ as any other Python package:

python setup.py install

### Dependencies

You will need the following Python packages and one of my other libraries:

* [pycrfsuite](https://github.com/tpeng/python-crfsuite "pycrfsuite") - python wrapper for CRFSuite
* [numpy](http://www.numpy.org/ "NumPy") - you should it
* [bioeval](https://github.com/savkov/bioeval "bioeval") - my library for evaluating BIO style annotation, which replaces the perl script from [CoNLL-2000](http://ilk.uvt.nl/team/sabine/chunklink/chunklink_2-2-2000_for_conll.pl)

### TODO

* command line interface
* migrate data structure to [pandas](http://pandas.pydata.org/ "pandas")
* more examples

### See Also

If you are interested in other sequence taggers, you might want to look at:

* [Stanford NLP](http://nlp.stanford.edu/software/lex-parser.shtml) -- POS tagger
* [ARK](http://www.ark.cs.cmu.edu/TweetNLP/) -- POS tagger for tweets
* [YamCha](http://chasen.org/~taku/software/yamcha/) -- BIO tagger/chunker
* [CRF++](http://taku910.github.io/crfpp/) -- BIO tagger/chunker
* [Wapiti](https://wapiti.limsi.fr/) -- POS & BIO tagger/chunker

Project details


Release history Release notifications | RSS feed

This version

0.2

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crfst-0.2.tar.gz (22.4 kB view details)

Uploaded Source

File details

Details for the file crfst-0.2.tar.gz.

File metadata

  • Download URL: crfst-0.2.tar.gz
  • Upload date:
  • Size: 22.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for crfst-0.2.tar.gz
Algorithm Hash digest
SHA256 57b6063b588458c990d74e61d6bf41ba385395bb099b0fcde7cda6f22baf7259
MD5 f5a95b2308ad77abd706877343b50521
BLAKE2b-256 296dd6cb76f25a892a3ff4d2fc051a54e841c2f6f4f578536a0ac437d62f4920

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page