spaCy Data Debug has utilities to help you debug your custom NER data. It checks for inconsistencies in labels for the same text.
Project description
spaCy Data Debug
spaCy Data Debug has utilities to help you debug your custom NER data. It checks for inconsistencies in labels for the same text,
Install
pip install spacy-data-debug
How to use
from pathlib import Path
import srsly
from spacy_data_debug.core import *
from spacy_data_debug.pipeline import *
0. Load your Data in the Prodigy Annotation Format
train = list(srsly.read_jsonl(base_dir / "train.jsonl"))
dev = list(srsly.read_jsonl(base_dir / "dev.jsonl"))
test = list(srsly.read_jsonl(base_dir / "test.jsonl"))
Clean, format and filter overlapping entities
While working on a large annotation projects the format of your data can get weird from different annotation sessions by different people.
This ensures you have data in a format useful for the other functions in this spacy-data-debug
train = fix_annotations_format(train)
dev = fix_annotations_format(dev)
test = fix_annotations_format(test)
Or construct a Pipeline
A Pipeline
holds your datasets together and runs spacy_data_debug
functions across all datasets.
This can make sure you have consistent annotations across your datasets split
pipeline = Pipeline(train, dev, test)
pipeline.apply(fix_annotations_format)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file spacy_data_debug-0.0.3.tar.gz
.
File metadata
- Download URL: spacy_data_debug-0.0.3.tar.gz
- Upload date:
- Size: 5.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7c4b75c2363108c04db5901e927189b305f9461b07a3f5e90a369c5fbc31527a |
|
MD5 | 80cb10b9dbbeb12230ccd094bdf5da6b |
|
BLAKE2b-256 | 06fa733405d977c9bf6c758cb207308369df52d28dadad02b55cc58dc68214ad |