Analyze and characterize your Spans. Integrated with spaCy.
Project description
spacy-span-analyzer
A simple tool to analyze the Spans in your dataset. It's tightly integrated with spaCy, so you can easily incorporate it to existing NLP pipelines. This is also a reproduction of Papay, et al's work on Dissecting Span Identification Tasks with Performance Prediction (EMNLP 2020).
⏳ Install
Using pip:
pip install spacy-span-analyzer
Directly from source (I highly recommend running this within a virtual environment):
git clone git@github.com:ljvmiranda921/spacy-span-analyzer.git
cd spacy-span-analyzer
pip install .
⏯ Usage
You can use the Span Analyzer as a command-line tool:
spacy-span-analyzer ./path/to/dataset.spacy
Or as an imported library:
import spacy
from spacy.tokens import DocBin
from spacy_span_analyzer import SpanAnalyzer
nlp = spacy.blank("en") # or any Language model
# Ensure that your dataset is a DocBin
doc_bin = DocBin().from_disk("./path/to/data.spacy")
docs = list(doc_bin.get_docs(nlp.vocab))
# Run SpanAnalyzer and get span characteristics
analyze = SpanAnalyzer(docs)
analyze.frequency
analyze.length
analyze.span_distinctiveness
analyze.boundary_distinctiveness
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for spacy-span-analyzer-0.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd489d07da8baeac372d7cee0eb0d4fbcdd03eb90c153643cf35d29c03a1bd6f |
|
MD5 | cb378abc66b1d71099ece68014914791 |
|
BLAKE2b-256 | dc4d64e23c69b58efddb089ec50d385d71cbecf8684a8c1fe4125ca2ef79321f |
Close
Hashes for spacy_span_analyzer-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4a4b9e7aa1fb446694f7c223547c4751cad8340361f50c3c2e3d83f2289906bc |
|
MD5 | 19cb724f5580300f4f8a3dfa0bfedbe3 |
|
BLAKE2b-256 | 9b6746151e123eb4882e251388a1dd9ac45bf3b57e4ec68991db17fc6b40739b |