Skip to main content

💫 SpaCy wrapper for ConceptNet 💫

Project description

concepCy

PyPI version github actions docs

concepCy is a spaCy wrapper for ConceptNet, a freely-available semantic network designed to help computers understand the meaning of words.

concepCy allows you to query ConceptNet.io to extract word meanings directly from the resource itself.

Install

You can install concepCy via pip:

pip install concepcy

Alternatively you can directly clone the repository and install it using poetry by running the following:

git clone https://github.com/JulesBelveze/concepcy.git
cd concepcy
poetry install

Getting Started

To get started you need to install of one the pre-trained spaCy model available here.

In ConceptNet words are represented as Node and relations between words as Edge.
The Node object contains the following attributes:

  • id: where you can look up all the information about that word
  • label: which may be a more complete phrase such as "an example" instead of just the word "example" that appears in the URI.
  • language: code for what language the label is in
  • term: a link to the most general version of this term. In many cases this is just the same URI.

The Edge object features the following attributes:

  • start: starting Node
  • end: ending Node
  • relation: name of the relation for those two nodes
  • text: some of ConceptNet's data is extracted from text, text shows you what this text was
  • weight: how believable the information is

Simple start

In this case we will simply be interested in the RelatedTo relations between words.

import spacy
import concepcy

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("concepcy")

doc = nlp("WHO is a lovely company")

# Access all the "RelatedTo" relations from the Doc
print("--- All the 'RelatedTo' relations from the Doc ---")
for word, relations in doc._.relatedto.items():
    print(f"Word: '{word}'\n{relations}")

# Access the "RelatedTo" relations word by word
print("--- The 'RelatedTo' relations word by word ---")
for token in doc:
    print(f"Word: '{token}'\n{token._.relatedto}\n")
--- All the 'RelatedTo' relations from the Doc ---
Word: 'company'
[{'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/business', 'type': 'Node', 'label': 'business', 'language': 'en', 'term': '/c/en/business'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[business]]', 'weight': 6.424017434596516}, {'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/corporation', 'type': 'Node', 'label': 'corporation', 'language': 'en', 'term': '/c/en/corporation'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[corporation]]', 'weight': 4.432155231938521}, {'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/organization', 'type': 'Node', 'label': 'organization', 'language': 'en', 'term': '/c/en/organization'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[organization]]', 'weight': 4.259107887809371}]

--- The 'RelatedTo' relations word by word ---
Word: 'WHO'
[]

Word: 'is'
[]

Word: 'a'
[]

Word: 'lovely'
[]

Word: 'company'
[{'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/business', 'type': 'Node', 'label': 'business', 'language': 'en', 'term': '/c/en/business'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[business]]', 'weight': 6.424017434596516}, {'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/corporation', 'type': 'Node', 'label': 'corporation', 'language': 'en', 'term': '/c/en/corporation'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[corporation]]', 'weight': 4.432155231938521}, {'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/organization', 'type': 'Node', 'label': 'organization', 'language': 'en', 'term': '/c/en/organization'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[organization]]', 'weight': 4.259107887809371}]

Custom configuration

One can customize the concepcy wrapper by changing the default value of the config. The two parameters of interest are:

  • relations_of_interest: List[str]: ConceptNet currently support 34 word-relations. Some of them might not be needed for your use case. To only keep the ones needed pass a list of all the relations you want to keep (see all relations available here). Each relation then becomes an extension.
  • filter_edge_fct: Callable[Edge]: Conceptnet is a crowd-sourced resource, meaning that some information might be more relevant than others. To only keep reliable relations you can pass a function that will take an Edge as input and will return a boolean indicating whether to filter that edge or not.
import spacy
import concepcy

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe(
    "concepcy",
    config={
        "relations_of_interest": ["MotivatedByGoal", "CapableOf"],
        "filter_edge_weight": 3.0,
        "filter_missing_text": True,
        "as_dict": False
    }
)

Documentation 📚

The whole documentation along with design decisions and examples can be found here.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

concepCy-0.1.0.tar.gz (53.7 kB view hashes)

Uploaded Source

Built Distribution

concepCy-0.1.0-py3-none-any.whl (6.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page