Some basic spaCy utility functions.
Project description
About
This repository houses a series of utility functions for spaCy. I designed these during the course of my work at the Smithsonian Institution and United States Holocaust Memorial Museum.
Functions
This will serve as the basic docs for the functions found in this repository.
count_spans
spaCy comes built with a count_by()
method for the Doc container. It can count anything that sits in a the nlp object's vocab. This works for NER because the position of ENT_TYPE is 78 in the nlp.vocab. Span labels, however, are large numbers that cannot be processed via this approach because of how spaCy uses NumPy. This is a naive approach to resolve this issue.
This function takes a Spacy Doc object and a string span_key as input. The span_key argument is used to specify the type of span to count such as "named_entities" or "noun_chunks".
The function then initializes two empty lists, spans and labels. It loops through all the spans of the specified type in the doc object, and for each span, it appends a tuple of (span.text, span.label_) to the spans list, where span.text is the actual text of the span and span.label_ is the label of the span. The function also appends the span.label_ to the labels list.
The function then creates two Counter objects: span_counts and label_counts. The Counter object span_counts counts the frequency of each unique (span.text, span.label_) tuple in the spans list, while the Counter object label_counts counts the frequency of each unique span.label_ in the labels list.
Finally, the function returns a tuple containing the two Counter objects span_counts and label_counts. The span_counts object maps each unique (span.text, span.label_) tuple to its frequency in the document, while the label_counts object maps each unique span.label_ to its frequency in the document.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file spacy-utils-0.0.1.tar.gz
.
File metadata
- Download URL: spacy-utils-0.0.1.tar.gz
- Upload date:
- Size: 1.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | daa4778aab3a04d474f5ab1ca4f801c409178ffeb1e0091b1f2d9c41e405d1d3 |
|
MD5 | bc754baed3e5fbcdab5b2528256e60c2 |
|
BLAKE2b-256 | c72054fa1cf650b6de850ba002a2035182f14d2cc8b870645417540f43301ea2 |
File details
Details for the file spacy_utils-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: spacy_utils-0.0.1-py3-none-any.whl
- Upload date:
- Size: 1.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f7646de8e61dd31a2b683e30f0afcd7076f36d20ab30b4c21f5b4b4fa40ce2c4 |
|
MD5 | dd0655bc71e8fab7ad329f6765efd17f |
|
BLAKE2b-256 | bb9520622623b9186d49dffd72f15eedf88f92b9f255bcd833407fb25af05be3 |