A collection of spaCy components for rules-based detection.
Project description
Bagpipes spaCy
Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities. These components include:
- Quote Detector: Identifies and extracts quotes from the text.
- Phrases Extractor: Extracts various types of phrases such as prepositional, noun, verb, adjective, and adverbial phrases.
- Normalizer: Normalizes the text by expanding contractions, removing special characters, and more.
- Triple Detector: Extracts triples (subject, predicate, object) from the text.
- Entity Similarity: Computes similarity between entities in the text and maps similar entities.
Table of Contents
Installation
To install Bagpipes spaCy, execute:
pip install bagpipes-spacy
Usage
Integrating the Components into your spaCy Pipeline
Begin by importing the components and then integrating them into your spaCy pipeline:
import spacy
from bagpipes_spacy import QuoteDetector, PhrasesExtractor, Normalizer, TripleDetector, EntitySimilarity
# Initialize your preferred spaCy model
nlp = spacy.blank('en')
# Integrate the components into the pipeline
nlp.add_pipe('quote_detector')
nlp.add_pipe('phrases_extractor')
nlp.add_pipe('normalizer')
nlp.add_pipe('triple_detector')
nlp.add_pipe('entity_similarity')
Text Processing with the Pipeline
After adding the components, you can process text as you typically would:
text = "She said, \"I'm going to the store.\" The store is located near the river."
doc = nlp(text)
Retrieving the Extracted Quotes
You can access the extracted quotes from the doc._.quotes
attribute:
for quote in doc._.quotes:
print(quote)
Output:
"I'm going to the store."
Retrieving the Extracted Phrases
You can access the extracted phrases from the doc._
attributes:
for prep_phrase in doc._.prep_phrases:
print(prep_phrase)
Output:
near the river
Repeat for doc._.noun_phrases
, doc._.verb_phrases
, doc._.adj_phrases
, and doc._.adv_phrases
.
Retrieving the Normalized Text
The normalizer modifies the doc
object itself, so you can access the normalized text as you usually would:
print(doc.text)
Output:
She said, "I am going to the store." The store is located near the river.
Retrieving the Extracted Triples
You can access the extracted triples from the doc._.triples
attribute:
for triple in doc._.triples:
print(triple)
Output:
('store', 'located near', 'river')
Retrieving the Entity Similarities and Mappings
You can access the entity similarities and mappings from the doc._.ent_similarity
and doc._.ent_mappings
attributes:
for ent1, ent2, similarity in doc._.ent_similarity:
print(ent1, ent2, similarity)
for ent, mappings in doc._.ent_mappings.items():
print(ent, mappings)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.