Skip to main content

spaCy pipeline component for CRF entity extraction

Project description

spacy_crfsuite: CRF entity tagger for spaCy.

✨ Features

  • spaCy NER component for Conditional Random Field entity extraction (via sklearn-crfsuite).
  • train & eval command line and example notebook.
  • supports JSON, CoNLL and Markdown annotations

Installation

Python

pip install spacy_crfsuite

🚀 Quickstart

Usage as a spaCy pipeline component

spaCy pipeline

import spacy

from spacy_crfsuite import CRFEntityExtractor

nlp = spacy.blank('en')
pipe = CRFEntityExtractor(nlp).from_disk("model.pkl")
nlp.add_pipe(pipe)

doc = nlp("show mexican restaurents up north")
for ent in doc.ents:
    print(ent.text, "--", ent.label_)

# Output:
# mexican -- cuisine
# north -- location

Follow this example notebook to train the CRF entity tagger from few restaurant search examples.

Train & evaluate CRF entity tagger

Set up configuration file

$ cat << EOF > config.json
{"c1": 0.03, "c2": 0.06}
EOF

Run training

$ python -m spacy_crfsuite.train examples/example.md -o model/ -c config.json
ℹ Loading config: config.json
ℹ Training CRF entity tagger with 15 examples.
ℹ Saving model to disk
✔ Successfully saved model to file.
/Users/talmago/git/spacy_crfsuite/model/model.pkl

Evaluate on a dataset

$ python -m spacy_crfsuite.eval examples/example.md -m model/model.pkl
ℹ Loading model from file
model/model.pkl
✔ Successfully loaded CRF tagger
<spacy_crfsuite.crf_extractor.CRFExtractor object at 0x126e5f438>
ℹ Loading dev dataset from file
examples/example.md
✔ Successfully loaded 15 dev examples.
⚠ f1 score: 1.0
              precision    recall  f1-score   support

           -      1.000     1.000     1.000         2
   B-cuisine      1.000     1.000     1.000         1
   L-cuisine      1.000     1.000     1.000         1
   U-cuisine      1.000     1.000     1.000         5
  U-location      1.000     1.000     1.000         2

   micro avg      1.000     1.000     1.000        11
   macro avg      1.000     1.000     1.000        11
weighted avg      1.000     1.000     1.000        11

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy_crfsuite-1.0.2.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spacy_crfsuite-1.0.2-py3-none-any.whl (17.1 kB view details)

Uploaded Python 3

File details

Details for the file spacy_crfsuite-1.0.2.tar.gz.

File metadata

  • Download URL: spacy_crfsuite-1.0.2.tar.gz
  • Upload date:
  • Size: 15.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.6.9

File hashes

Hashes for spacy_crfsuite-1.0.2.tar.gz
Algorithm Hash digest
SHA256 f0c033da14fd61ed6ee1d170cce081913a4c011000fcd3e9027fdbaed508df67
MD5 b256a3d16d7004b52e552aa887b873d2
BLAKE2b-256 ad90ce8f5601341d4273d19c1ab24f13dc529246c1293bf28a5f462304ee0e6b

See more details on using hashes here.

File details

Details for the file spacy_crfsuite-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: spacy_crfsuite-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 17.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.6.9

File hashes

Hashes for spacy_crfsuite-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1c9450afc8ec7e60c58e317238a8627bd2591ac8648a2db38cea0469a032c2fb
MD5 9e2f09b9f0f881d3d526e3f55e6e7ce1
BLAKE2b-256 1a6cfcfd5a58d3085b642b21cf09c0f0fa0be1bd1ed842ef8ed072cc0a01b0ef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page