Analyze and characterize your Spans. Integrated with spaCy.

These details have not been verified by PyPI

Project links

Project description

spacy-span-analyzer

A simple tool to analyze the Spans in your dataset. It's tightly integrated with spaCy, so you can easily incorporate it to existing NLP pipelines. This is also a reproduction of Papay, et al's work on Dissecting Span Identification Tasks with Performance Prediction (EMNLP 2020).

⏳ Install

Using pip:

pip install spacy-span-analyzer

Directly from source (I highly recommend running this within a virtual environment):

git clone git@github.com:ljvmiranda921/spacy-span-analyzer.git
cd spacy-span-analyzer
pip install .

⏯ Usage

You can use the Span Analyzer as a command-line tool:

spacy-span-analyzer ./path/to/dataset.spacy

Or as an imported library:

import spacy
from spacy.tokens import DocBin
from spacy_span_analyzer import SpanAnalyzer

nlp = spacy.blank("en")  # or any Language model

# Ensure that your dataset is a DocBin
doc_bin = DocBin().from_disk("./path/to/data.spacy")
docs = list(doc_bin.get_docs(nlp.vocab))

# Run SpanAnalyzer and get span characteristics
analyze = SpanAnalyzer(docs)
analyze.frequency  
analyze.length
analyze.span_distinctiveness
analyze.boundary_distinctiveness

Inputs are expected to be a list of spaCy Docs or a DocBin (if you're using the command-line tool).

Working with Spans

In spaCy, you'd want to store your Spans in the doc.spans property, under a particular spans_key (sc by default). Unlike the doc.ents property, doc.spans allows overlapping entities. This is useful especially for downstream tasks like Span Categorization.

A common way to do this is to use char_span to define a slice from your Doc:

doc = nlp(text)
spans = []
from annotation in annotations:
    span = doc.char_span(
        annotation["start"],
        annotation["end"],
        annotation["label"],
    )
    spans.append(span)

# Put all spans under a spans_key
doc.spans["sc"] = spans

You can also achieve the same thing by using set_ents or by creating a SpanGroup.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.0

Mar 4, 2022

0.2.0

Feb 16, 2022

0.1.0

Feb 11, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy-span-analyzer-0.3.0.tar.gz (11.5 kB view details)

Uploaded Mar 4, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

spacy_span_analyzer-0.3.0-py3-none-any.whl (13.2 kB view details)

Uploaded Mar 4, 2022 Python 3

File details

Details for the file spacy-span-analyzer-0.3.0.tar.gz.

File metadata

Download URL: spacy-span-analyzer-0.3.0.tar.gz
Upload date: Mar 4, 2022
Size: 11.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.0 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.10

File hashes

Hashes for spacy-span-analyzer-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`089cd3ef0db03d4981d546b64d250adf80c6efbe28b320786c5869ca8597c8e1`
MD5	`3cc6836aa3fff53a1548eb13ce03a002`
BLAKE2b-256	`bff380317c5b1fdd5df5f84c0f3a58a6d6b55d23b9f442364cf3af77cfbaf036`

See more details on using hashes here.

File details

Details for the file spacy_span_analyzer-0.3.0-py3-none-any.whl.

File metadata

Download URL: spacy_span_analyzer-0.3.0-py3-none-any.whl
Upload date: Mar 4, 2022
Size: 13.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.0 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.10

File hashes

Hashes for spacy_span_analyzer-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`db14c8c0d7bbc2b4db8ee0e6aef0c320988a0d2c9c470c829f357e99a86128ad`
MD5	`9257eb3f0a8932eab804bd33881fd80f`
BLAKE2b-256	`f57dbd6d44e7e059814eb091050d268a64d46fe709163dd04497063e7f5884db`

See more details on using hashes here.

spacy-span-analyzer 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

spacy-span-analyzer

⏳ Install

⏯ Usage

Working with Spans

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes