Skip to main content

Sequence learning toolkit

Project description

seqlearn is a sequence classification toolkit for Python. It is designed to extend scikit-learn and offer as similar as possible an API.

Compiling and installing

Get NumPy >=1.6, SciPy >=0.11, Cython >=0.20.2 and a recent version of scikit-learn. Then issue:

python setup.py install

to install seqlearn.

If you want to use seqlearn from its source directory without installing, you have to compile first:

python setup.py build_ext --inplace

Getting started

The easiest way to start using seqlearn is to fetch a dataset in CoNLL 2000 format. Define a task-specific feature extraction function, e.g.:

>>> def features(sequence, i):
...     yield "word=" + sequence[i].lower()
...     if sequence[i].isupper():
...         yield "Uppercase"
...

Load the training file, say train.txt:

>>> from seqlearn.datasets import load_conll
>>> X_train, y_train, lengths_train = load_conll("train.txt", features)

Train a model:

>>> from seqlearn.perceptron import StructuredPerceptron
>>> clf = StructuredPerceptron()
>>> clf.fit(X_train, y_train, lengths_train)

Check how well you did on a validation set, say validation.txt:

>>> X_test, y_test, lengths_test = load_conll("validation.txt", features)
>>> from seqlearn.evaluation import bio_f_score
>>> y_pred = clf.predict(X_test, lengths_test)
>>> print(bio_f_score(y_test, y_pred))

For more information, see the documentation.

Travis

Project details


Release history Release notifications | RSS feed

This version

0.2

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seqlearn-0.2.tar.gz (557.4 kB view details)

Uploaded Source

File details

Details for the file seqlearn-0.2.tar.gz.

File metadata

  • Download URL: seqlearn-0.2.tar.gz
  • Upload date:
  • Size: 557.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for seqlearn-0.2.tar.gz
Algorithm Hash digest
SHA256 1743087499ad25394a2f539edf0105271d3bf5cca8a4ca7b73f9fbf1518f4126
MD5 48a5bdc466349f2289f039893bc88b54
BLAKE2b-256 252c95da36839f647a6b15da1fd10f68d755c7fca549c92aabb3ff734f5c682c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page