A multi-purpose sequential tagger wrapped around CRFSuite
Project description
CRFSuiteTagger
==============
_CRFSuiteTagger_ is a sequence tagger based on the [pycrfsuite](https://github.com/tpeng/python-crfsuite "pycrfsuite") python wrapper for [CRFSuite](http://www.chokkan.org/software/crfsuite/ "CRFSuite"). It is built for chunking, NER, and other BIO (also referred to as IOB) based text annotation tasks.
### Why would you need this?
_CRFSuiteTagger_ has a wide selection of common features, and the capability to easily integrate additional ones. The features are controlled using a simple string-based feature template. Additional features can be easily added through new _feature generating functions_ (see `crfsuitetagger.ftex`) passed on the `CRFSuiteTagger` constructor.
### Installation
You should be able to install _CRFSuiteTagger_ as any other Python package:
python setup.py install
### Dependencies
You will need the following Python packages and one of my other libraries:
* [pycrfsuite](https://github.com/tpeng/python-crfsuite "pycrfsuite") - python wrapper for CRFSuite
* [numpy](http://www.numpy.org/ "NumPy") - you should it
* [bioeval](https://github.com/savkov/bioeval "bioeval") - my library for evaluating BIO style annotation, which replaces the perl script from [CoNLL-2000](http://ilk.uvt.nl/team/sabine/chunklink/chunklink_2-2-2000_for_conll.pl)
### TODO
* command line interface
* migrate data structure to [pandas](http://pandas.pydata.org/ "pandas")
* more examples
### See Also
If you are interested in other sequence taggers, you might want to look at:
* [Stanford NLP](http://nlp.stanford.edu/software/lex-parser.shtml) -- POS tagger
* [ARK](http://www.ark.cs.cmu.edu/TweetNLP/) -- POS tagger for tweets
* [YamCha](http://chasen.org/~taku/software/yamcha/) -- BIO tagger/chunker
* [CRF++](http://taku910.github.io/crfpp/) -- BIO tagger/chunker
* [Wapiti](https://wapiti.limsi.fr/) -- POS & BIO tagger/chunker
==============
_CRFSuiteTagger_ is a sequence tagger based on the [pycrfsuite](https://github.com/tpeng/python-crfsuite "pycrfsuite") python wrapper for [CRFSuite](http://www.chokkan.org/software/crfsuite/ "CRFSuite"). It is built for chunking, NER, and other BIO (also referred to as IOB) based text annotation tasks.
### Why would you need this?
_CRFSuiteTagger_ has a wide selection of common features, and the capability to easily integrate additional ones. The features are controlled using a simple string-based feature template. Additional features can be easily added through new _feature generating functions_ (see `crfsuitetagger.ftex`) passed on the `CRFSuiteTagger` constructor.
### Installation
You should be able to install _CRFSuiteTagger_ as any other Python package:
python setup.py install
### Dependencies
You will need the following Python packages and one of my other libraries:
* [pycrfsuite](https://github.com/tpeng/python-crfsuite "pycrfsuite") - python wrapper for CRFSuite
* [numpy](http://www.numpy.org/ "NumPy") - you should it
* [bioeval](https://github.com/savkov/bioeval "bioeval") - my library for evaluating BIO style annotation, which replaces the perl script from [CoNLL-2000](http://ilk.uvt.nl/team/sabine/chunklink/chunklink_2-2-2000_for_conll.pl)
### TODO
* command line interface
* migrate data structure to [pandas](http://pandas.pydata.org/ "pandas")
* more examples
### See Also
If you are interested in other sequence taggers, you might want to look at:
* [Stanford NLP](http://nlp.stanford.edu/software/lex-parser.shtml) -- POS tagger
* [ARK](http://www.ark.cs.cmu.edu/TweetNLP/) -- POS tagger for tweets
* [YamCha](http://chasen.org/~taku/software/yamcha/) -- BIO tagger/chunker
* [CRF++](http://taku910.github.io/crfpp/) -- BIO tagger/chunker
* [Wapiti](https://wapiti.limsi.fr/) -- POS & BIO tagger/chunker
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
crfst-0.2.tar.gz
(22.4 kB
view details)
File details
Details for the file crfst-0.2.tar.gz
.
File metadata
- Download URL: crfst-0.2.tar.gz
- Upload date:
- Size: 22.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 57b6063b588458c990d74e61d6bf41ba385395bb099b0fcde7cda6f22baf7259 |
|
MD5 | f5a95b2308ad77abd706877343b50521 |
|
BLAKE2b-256 | 296dd6cb76f25a892a3ff4d2fc051a54e841c2f6f4f578536a0ac437d62f4920 |