Skip to main content

spaCy pipeline component for CRF entity extraction

Project description

spacy_crfsuite: CRF entity tagger for spaCy.

✨ Features

  • spaCy NER component for Conditional Random Field entity extraction (via sklearn-crfsuite).
  • train & eval command line and example notebook.
  • supports JSON, CoNLL and Markdown annotations

Installation

Python

pip install spacy_crfsuite

🚀 Quickstart

Usage as a spaCy pipeline component

spaCy pipeline

import spacy

from spacy_crfsuite import CRFEntityExtractor

nlp = spacy.blank('en')
pipe = CRFEntityExtractor(nlp).from_disk("model.pkl")
nlp.add_pipe(pipe)

doc = nlp("show mexican restaurents up north")
for ent in doc.ents:
    print(ent.text, "--", ent.label_)

# Output:
# mexican -- cuisine
# north -- location

Follow this example notebook to train the CRF entity tagger from few restaurant search examples.

Train & evaluate CRF entity tagger

Set up configuration file

$ cat << EOF > config.json
{"c1": 0.03, "c2": 0.06}
EOF

Run training

$ python -m spacy_crfsuite.train examples/example.md -o model/ -c config.json
ℹ Loading config: config.json
ℹ Training CRF entity tagger with 15 examples.
ℹ Saving model to disk
✔ Successfully saved model to file.
/Users/talmago/git/spacy_crfsuite/model/model.pkl

Evaluate on a dataset

$ python -m spacy_crfsuite.eval examples/example.md -m model/model.pkl
ℹ Loading model from file
model/model.pkl
✔ Successfully loaded CRF tagger
<spacy_crfsuite.crf_extractor.CRFExtractor object at 0x126e5f438>
ℹ Loading dev dataset from file
examples/example.md
✔ Successfully loaded 15 dev examples.
⚠ f1 score: 1.0
              precision    recall  f1-score   support

           -      1.000     1.000     1.000         2
   B-cuisine      1.000     1.000     1.000         1
   L-cuisine      1.000     1.000     1.000         1
   U-cuisine      1.000     1.000     1.000         5
  U-location      1.000     1.000     1.000         2

   micro avg      1.000     1.000     1.000        11
   macro avg      1.000     1.000     1.000        11
weighted avg      1.000     1.000     1.000        11

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy_crfsuite-1.0.1.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spacy_crfsuite-1.0.1-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file spacy_crfsuite-1.0.1.tar.gz.

File metadata

  • Download URL: spacy_crfsuite-1.0.1.tar.gz
  • Upload date:
  • Size: 15.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.6.9

File hashes

Hashes for spacy_crfsuite-1.0.1.tar.gz
Algorithm Hash digest
SHA256 fc838b942d30a40b293db52fb895de39fbc7284d8a4b9d718c49e7e7da81e917
MD5 224fd71eb0caddc7baa7bcb6286e683a
BLAKE2b-256 ac65e6dad0ac5eb2d805da839a13d46fc17246865e27e91d66e9d7e11dc0b361

See more details on using hashes here.

File details

Details for the file spacy_crfsuite-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: spacy_crfsuite-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.6.9

File hashes

Hashes for spacy_crfsuite-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 86eea1b017fcb79b2f6a042801a64238c76d773d671cfb6cbe3197dce2196e32
MD5 6804e0f39b079c61584199c42dbd6254
BLAKE2b-256 b8dc7b025c49a6a03285a8aa7836f6331458a14e50a55d2f5b8018e220056cc8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page