Skip to main content

Some basic spaCy utility functions.

Project description

About

This repository houses a series of utility functions for spaCy. I designed these during the course of my work at the Smithsonian Institution and United States Holocaust Memorial Museum.

Functions

This will serve as the basic docs for the functions found in this repository.

count_spans

spaCy comes built with a count_by() method for the Doc container. It can count anything that sits in a the nlp object's vocab. This works for NER because the position of ENT_TYPE is 78 in the nlp.vocab. Span labels, however, are large numbers that cannot be processed via this approach because of how spaCy uses NumPy. This is a naive approach to resolve this issue.

This function takes a Spacy Doc object and a string span_key as input. The span_key argument is used to specify the type of span to count such as "named_entities" or "noun_chunks".

The function then initializes two empty lists, spans and labels. It loops through all the spans of the specified type in the doc object, and for each span, it appends a tuple of (span.text, span.label_) to the spans list, where span.text is the actual text of the span and span.label_ is the label of the span. The function also appends the span.label_ to the labels list.

The function then creates two Counter objects: span_counts and label_counts. The Counter object span_counts counts the frequency of each unique (span.text, span.label_) tuple in the spans list, while the Counter object label_counts counts the frequency of each unique span.label_ in the labels list.

Finally, the function returns a tuple containing the two Counter objects span_counts and label_counts. The span_counts object maps each unique (span.text, span.label_) tuple to its frequency in the document, while the label_counts object maps each unique span.label_ to its frequency in the document.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy-utils-0.0.1.tar.gz (1.9 kB view details)

Uploaded Source

Built Distribution

spacy_utils-0.0.1-py3-none-any.whl (1.8 kB view details)

Uploaded Python 3

File details

Details for the file spacy-utils-0.0.1.tar.gz.

File metadata

  • Download URL: spacy-utils-0.0.1.tar.gz
  • Upload date:
  • Size: 1.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.5

File hashes

Hashes for spacy-utils-0.0.1.tar.gz
Algorithm Hash digest
SHA256 daa4778aab3a04d474f5ab1ca4f801c409178ffeb1e0091b1f2d9c41e405d1d3
MD5 bc754baed3e5fbcdab5b2528256e60c2
BLAKE2b-256 c72054fa1cf650b6de850ba002a2035182f14d2cc8b870645417540f43301ea2

See more details on using hashes here.

File details

Details for the file spacy_utils-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: spacy_utils-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 1.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.5

File hashes

Hashes for spacy_utils-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f7646de8e61dd31a2b683e30f0afcd7076f36d20ab30b4c21f5b4b4fa40ce2c4
MD5 dd0655bc71e8fab7ad329f6765efd17f
BLAKE2b-256 bb9520622623b9186d49dffd72f15eedf88f92b9f255bcd833407fb25af05be3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page