Skip to main content

A collection of spaCy components for rules-based detection.

Project description

GitHub Stars PyPi Version PyPi Downloads

Bagpipes spaCy

number spacy logo

Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities. These components include:

  1. Quote Detector: Identifies and extracts quotes from the text.
  2. Phrases Extractor: Extracts various types of phrases such as prepositional, noun, verb, adjective, and adverbial phrases.
  3. Normalizer: Normalizes the text by expanding contractions, removing special characters, and more.
  4. Triple Detector: Extracts triples (subject, predicate, object) from the text.
  5. Entity Similarity: Computes similarity between entities in the text and maps similar entities.

Table of Contents

Installation

To install Bagpipes spaCy, execute:

pip install bagpipes-spacy

Usage

Integrating the Components into your spaCy Pipeline

Begin by importing the components and then integrating them into your spaCy pipeline:

import spacy
from bagpipes_spacy import QuoteDetector, PhrasesExtractor, Normalizer, TripleDetector, EntitySimilarity

# Initialize your preferred spaCy model
nlp = spacy.blank('en')

# Integrate the components into the pipeline
nlp.add_pipe('quote_detector')
nlp.add_pipe('phrases_extractor')
nlp.add_pipe('normalizer')
nlp.add_pipe('triple_detector')
nlp.add_pipe('entity_similarity')

Text Processing with the Pipeline

After adding the components, you can process text as you typically would:

text = "She said, \"I'm going to the store.\" The store is located near the river."
doc = nlp(text)

Retrieving the Extracted Quotes

You can access the extracted quotes from the doc._.quotes attribute:

for quote in doc._.quotes:
    print(quote)

Output:

"I'm going to the store."

Retrieving the Extracted Phrases

You can access the extracted phrases from the doc._ attributes:

for prep_phrase in doc._.prep_phrases:
    print(prep_phrase)

Output:

near the river

Repeat for doc._.noun_phrases, doc._.verb_phrases, doc._.adj_phrases, and doc._.adv_phrases.

Retrieving the Normalized Text

The normalizer modifies the doc object itself, so you can access the normalized text as you usually would:

print(doc.text)

Output:

She said, "I am going to the store." The store is located near the river.

Retrieving the Extracted Triples

You can access the extracted triples from the doc._.triples attribute:

for triple in doc._.triples:
    print(triple)

Output:

('store', 'located near', 'river')

Retrieving the Entity Similarities and Mappings

You can access the entity similarities and mappings from the doc._.ent_similarity and doc._.ent_mappings attributes:

for ent1, ent2, similarity in doc._.ent_similarity:
    print(ent1, ent2, similarity)

for ent, mappings in doc._.ent_mappings.items():
    print(ent, mappings)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bagpipes-spacy-0.1.2.tar.gz (5.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page