Analyze and characterize your Spans. Integrated with spaCy.
A simple tool to analyze the Spans in your dataset. It's tightly integrated with spaCy, so you can easily incorporate it to existing NLP pipelines. This is also a reproduction of Papay, et al's work on Dissecting Span Identification Tasks with Performance Prediction (EMNLP 2020).
pip install spacy-span-analyzer
Directly from source (I highly recommend running this within a virtual environment):
git clone firstname.lastname@example.org:ljvmiranda921/spacy-span-analyzer.git cd spacy-span-analyzer pip install .
You can use the Span Analyzer as a command-line tool:
Or as an imported library:
import spacy from spacy.tokens import DocBin from spacy_span_analyzer import SpanAnalyzer nlp = spacy.blank("en") # or any Language model # Ensure that your dataset is a DocBin doc_bin = DocBin().from_disk("./path/to/data.spacy") docs = list(doc_bin.get_docs(nlp.vocab)) # Run SpanAnalyzer and get span characteristics analyze = SpanAnalyzer(docs) analyze.frequency analyze.length analyze.span_distinctiveness analyze.boundary_distinctiveness
Working with Spans
In spaCy, you'd want to store your Spans in the
doc.spans property, under a particular
sc by default). Unlike the
overlapping entities. This is useful especially for downstream tasks like Span
A common way to do this is to use
char_span to define a slice from your
doc = nlp(text) spans =  from annotation in annotations: span = doc.char_span( annotation["start"], annotation["end"], annotation["label"], ) spans.append(span) # Put all spans under a spans_key doc.spans["sc"] = spans
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Hashes for spacy-span-analyzer-0.3.0.tar.gz
Hashes for spacy_span_analyzer-0.3.0-py3-none-any.whl