Skip to main content

Analyze and characterize your Spans. Integrated with spaCy.

Project description

spacy-span-analyzer

A simple tool to analyze the Spans in your dataset. It's tightly integrated with spaCy, so you can easily incorporate it to existing NLP pipelines. This is also a reproduction of Papay, et al's work on Dissecting Span Identification Tasks with Performance Prediction (EMNLP 2020).

⏳ Install

Using pip:

pip install spacy-span-analyzer

Directly from source (I highly recommend running this within a virtual environment):

git clone git@github.com:ljvmiranda921/spacy-span-analyzer.git
cd spacy-span-analyzer
pip install .

⏯ Usage

You can use the Span Analyzer as a command-line tool:

spacy-span-analyzer ./path/to/dataset.spacy

Or as an imported library:

import spacy
from spacy.tokens import DocBin
from spacy_span_analyzer import SpanAnalyzer

nlp = spacy.blank("en")  # or any Language model

# Ensure that your dataset is a DocBin
doc_bin = DocBin().from_disk("./path/to/data.spacy")
docs = list(doc_bin.get_docs(nlp.vocab))

# Run SpanAnalyzer and get span characteristics
analyze = SpanAnalyzer(docs)
analyze.frequency  
analyze.length
analyze.span_distinctiveness
analyze.boundary_distinctiveness

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy-span-analyzer-0.1.0.tar.gz (6.0 kB view hashes)

Uploaded Source

Built Distribution

spacy_span_analyzer-0.1.0-py3-none-any.whl (6.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page