Skip to main content

Analyze and characterize your Spans. Integrated with spaCy.

Project description

spacy-span-analyzer

A simple tool to analyze the Spans in your dataset. It's tightly integrated with spaCy, so you can easily incorporate it to existing NLP pipelines. This is also a reproduction of Papay, et al's work on Dissecting Span Identification Tasks with Performance Prediction (EMNLP 2020).

⏳ Install

Using pip:

pip install spacy-span-analyzer

Directly from source (I highly recommend running this within a virtual environment):

git clone git@github.com:ljvmiranda921/spacy-span-analyzer.git
cd spacy-span-analyzer
pip install .

⏯ Usage

You can use the Span Analyzer as a command-line tool:

spacy-span-analyzer ./path/to/dataset.spacy

Or as an imported library:

import spacy
from spacy.tokens import DocBin
from spacy_span_analyzer import SpanAnalyzer

nlp = spacy.blank("en")  # or any Language model

# Ensure that your dataset is a DocBin
doc_bin = DocBin().from_disk("./path/to/data.spacy")
docs = list(doc_bin.get_docs(nlp.vocab))

# Run SpanAnalyzer and get span characteristics
analyze = SpanAnalyzer(docs)
analyze.frequency  
analyze.length
analyze.span_distinctiveness
analyze.boundary_distinctiveness

Inputs are expected to be a list of spaCy Docs or a DocBin (if you're using the command-line tool).

Working with Spans

In spaCy, you'd want to store your Spans in the doc.spans property, under a particular spans_key (sc by default). Unlike the doc.ents property, doc.spans allows overlapping entities. This is useful especially for downstream tasks like Span Categorization.

A common way to do this is to use char_span to define a slice from your Doc:

doc = nlp(text)
spans = []
from annotation in annotations:
    span = doc.char_span(
        annotation["start"],
        annotation["end"],
        annotation["label"],
    )
    spans.append(span)

# Put all spans under a spans_key
doc.spans["sc"] = spans

You can also achieve the same thing by using set_ents or by creating a SpanGroup.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy-span-analyzer-0.3.0.tar.gz (11.5 kB view details)

Uploaded Source

Built Distribution

spacy_span_analyzer-0.3.0-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file spacy-span-analyzer-0.3.0.tar.gz.

File metadata

  • Download URL: spacy-span-analyzer-0.3.0.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.0 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.10

File hashes

Hashes for spacy-span-analyzer-0.3.0.tar.gz
Algorithm Hash digest
SHA256 089cd3ef0db03d4981d546b64d250adf80c6efbe28b320786c5869ca8597c8e1
MD5 3cc6836aa3fff53a1548eb13ce03a002
BLAKE2b-256 bff380317c5b1fdd5df5f84c0f3a58a6d6b55d23b9f442364cf3af77cfbaf036

See more details on using hashes here.

File details

Details for the file spacy_span_analyzer-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: spacy_span_analyzer-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 13.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.0 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.10

File hashes

Hashes for spacy_span_analyzer-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 db14c8c0d7bbc2b4db8ee0e6aef0c320988a0d2c9c470c829f357e99a86128ad
MD5 9257eb3f0a8932eab804bd33881fd80f
BLAKE2b-256 f57dbd6d44e7e059814eb091050d268a64d46fe709163dd04497063e7f5884db

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page