Skip to main content

spaCy Data Debug has utilities to help you debug your custom NER data. It checks for inconsistencies in labels for the same text.

Project description

spaCy Data Debug

spaCy Data Debug has utilities to help you debug your custom NER data. It checks for inconsistencies in labels for the same text,

Install

pip install spacy-data-debug

How to use

from pathlib import Path
import srsly
from spacy_data_debug.core import *
from spacy_data_debug.pipeline import *

0. Load your Data in the Prodigy Annotation Format

train = list(srsly.read_jsonl(base_dir / "train.jsonl"))
dev = list(srsly.read_jsonl(base_dir / "dev.jsonl"))
test = list(srsly.read_jsonl(base_dir / "test.jsonl"))

Clean, format and filter overlapping entities

While working on a large annotation projects the format of your data can get weird from different annotation sessions by different people. This ensures you have data in a format useful for the other functions in this spacy-data-debug

train = fix_annotations_format(train)
dev = fix_annotations_format(dev)
test = fix_annotations_format(test)

Or construct a Pipeline

A Pipeline holds your datasets together and runs spacy_data_debug functions across all datasets. This can make sure you have consistent annotations across your datasets split

pipeline = Pipeline(train, dev, test)
pipeline.apply(fix_annotations_format)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy_data_debug-0.0.3.tar.gz (5.4 kB view details)

Uploaded Source

File details

Details for the file spacy_data_debug-0.0.3.tar.gz.

File metadata

  • Download URL: spacy_data_debug-0.0.3.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for spacy_data_debug-0.0.3.tar.gz
Algorithm Hash digest
SHA256 7c4b75c2363108c04db5901e927189b305f9461b07a3f5e90a369c5fbc31527a
MD5 80cb10b9dbbeb12230ccd094bdf5da6b
BLAKE2b-256 06fa733405d977c9bf6c758cb207308369df52d28dadad02b55cc58dc68214ad

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page