Analyze and characterize your Spans. Integrated with spaCy.
Project description
spacy-span-analyzer
A simple tool to analyze the Spans in your dataset. It's tightly integrated with spaCy, so you can easily incorporate it to existing NLP pipelines. This is also a reproduction of Papay, et al's work on Dissecting Span Identification Tasks with Performance Prediction (EMNLP 2020).
⏳ Install
Using pip:
pip install spacy-span-analyzer
Directly from source (I highly recommend running this within a virtual environment):
git clone git@github.com:ljvmiranda921/spacy-span-analyzer.git
cd spacy-span-analyzer
pip install .
⏯ Usage
You can use the Span Analyzer as a command-line tool:
spacy-span-analyzer ./path/to/dataset.spacy
Or as an imported library:
import spacy
from spacy.tokens import DocBin
from spacy_span_analyzer import SpanAnalyzer
nlp = spacy.blank("en") # or any Language model
# Ensure that your dataset is a DocBin
doc_bin = DocBin().from_disk("./path/to/data.spacy")
docs = list(doc_bin.get_docs(nlp.vocab))
# Run SpanAnalyzer and get span characteristics
analyze = SpanAnalyzer(docs)
analyze.frequency
analyze.length
analyze.span_distinctiveness
analyze.boundary_distinctiveness
Inputs are expected to be a list of spaCy Docs or a DocBin (if you're using the command-line tool).
Working with Spans
In spaCy, you'd want to store your Spans in the
doc.spans
property, under a particular
spans_key
(sc
by default). Unlike the
doc.ents
property, doc.spans
allows
overlapping entities. This is useful especially for downstream tasks like Span
Categorization.
A common way to do this is to use
char_span
to define a slice from your
Doc:
doc = nlp(text)
spans = []
from annotation in annotations:
span = doc.char_span(
annotation["start"],
annotation["end"],
annotation["label"],
)
spans.append(span)
# Put all spans under a spans_key
doc.spans["sc"] = spans
You can also achieve the same thing by using
set_ents
or by creating a
SpanGroup.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file spacy-span-analyzer-0.3.0.tar.gz
.
File metadata
- Download URL: spacy-span-analyzer-0.3.0.tar.gz
- Upload date:
- Size: 11.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.0 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 089cd3ef0db03d4981d546b64d250adf80c6efbe28b320786c5869ca8597c8e1 |
|
MD5 | 3cc6836aa3fff53a1548eb13ce03a002 |
|
BLAKE2b-256 | bff380317c5b1fdd5df5f84c0f3a58a6d6b55d23b9f442364cf3af77cfbaf036 |
File details
Details for the file spacy_span_analyzer-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: spacy_span_analyzer-0.3.0-py3-none-any.whl
- Upload date:
- Size: 13.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.0 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | db14c8c0d7bbc2b4db8ee0e6aef0c320988a0d2c9c470c829f357e99a86128ad |
|
MD5 | 9257eb3f0a8932eab804bd33881fd80f |
|
BLAKE2b-256 | f57dbd6d44e7e059814eb091050d268a64d46fe709163dd04497063e7f5884db |