Skip to main content

💫 SpaCy wrapper for ConceptNet 💫

Project description

concepCy

PyPI version github actions docs

concepCy is a spaCy wrapper for ConceptNet, a freely-available semantic network designed to help computers understand the meaning of words.

concepCy allows you to query ConceptNet.io to extract word meanings directly from the resource itself.

Install

You can install concepCy via pip:

pip install concepcy

Alternatively you can directly clone the repository and install it using poetry by running the following:

git clone https://github.com/JulesBelveze/concepcy.git
cd concepcy
poetry install

Getting Started

To get started you need to install of one the pre-trained spaCy model available here.

In ConceptNet words are represented as Node and relations between words as Edge.
The Node object contains the following attributes:

  • id: where you can look up all the information about that word
  • label: which may be a more complete phrase such as "an example" instead of just the word "example" that appears in the URI.
  • language: code for what language the label is in
  • term: a link to the most general version of this term. In many cases this is just the same URI.

The Edge object features the following attributes:

  • start: starting Node
  • end: ending Node
  • relation: name of the relation for those two nodes
  • text: some of ConceptNet's data is extracted from text, text shows you what this text was
  • weight: how believable the information is

Simple start

In this case we will simply be interested in the RelatedTo relations between words.

import spacy
import concepcy

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("concepcy")

doc = nlp("WHO is a lovely company")

# Access all the "RelatedTo" relations from the Doc
print("--- All the 'RelatedTo' relations from the Doc ---")
for word, relations in doc._.relatedto.items():
    print(f"Word: '{word}'\n{relations}")

# Access the "RelatedTo" relations word by word
print("--- The 'RelatedTo' relations word by word ---")
for token in doc:
    print(f"Word: '{token}'\n{token._.relatedto}\n")
--- All the 'RelatedTo' relations from the Doc ---
Word: 'company'
[{'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/business', 'type': 'Node', 'label': 'business', 'language': 'en', 'term': '/c/en/business'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[business]]', 'weight': 6.424017434596516}, {'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/corporation', 'type': 'Node', 'label': 'corporation', 'language': 'en', 'term': '/c/en/corporation'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[corporation]]', 'weight': 4.432155231938521}, {'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/organization', 'type': 'Node', 'label': 'organization', 'language': 'en', 'term': '/c/en/organization'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[organization]]', 'weight': 4.259107887809371}]

--- The 'RelatedTo' relations word by word ---
Word: 'WHO'
[]

Word: 'is'
[]

Word: 'a'
[]

Word: 'lovely'
[]

Word: 'company'
[{'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/business', 'type': 'Node', 'label': 'business', 'language': 'en', 'term': '/c/en/business'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[business]]', 'weight': 6.424017434596516}, {'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/corporation', 'type': 'Node', 'label': 'corporation', 'language': 'en', 'term': '/c/en/corporation'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[corporation]]', 'weight': 4.432155231938521}, {'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/organization', 'type': 'Node', 'label': 'organization', 'language': 'en', 'term': '/c/en/organization'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[organization]]', 'weight': 4.259107887809371}]

Custom configuration

One can customize the concepcy wrapper by changing the default value of the config. The two parameters of interest are:

  • relations_of_interest: List[str]: ConceptNet currently support 34 word-relations. Some of them might not be needed for your use case. To only keep the ones needed pass a list of all the relations you want to keep (see all relations available here). Each relation then becomes an extension.
  • filter_edge_fct: Callable[Edge]: Conceptnet is a crowd-sourced resource, meaning that some information might be more relevant than others. To only keep reliable relations you can pass a function that will take an Edge as input and will return a boolean indicating whether to filter that edge or not.
import spacy
import concepcy

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe(
    "concepcy",
    config={
        "relations_of_interest": ["MotivatedByGoal", "CapableOf"],
        "filter_edge_weight": 3.0,
        "filter_missing_text": True,
        "as_dict": False
    }
)

Documentation 📚

The whole documentation along with design decisions and examples can be found here.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

concepCy-0.1.0.tar.gz (53.7 kB view details)

Uploaded Source

Built Distribution

concepCy-0.1.0-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file concepCy-0.1.0.tar.gz.

File metadata

  • Download URL: concepCy-0.1.0.tar.gz
  • Upload date:
  • Size: 53.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.9.10 Darwin/21.4.0

File hashes

Hashes for concepCy-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8db41a2cd29ab6b95cd0ecc759beea6da4b08113f6ea985706a1df751cbca414
MD5 5428b12d219c8f80bf53edaae8716b0c
BLAKE2b-256 dd423943d5a2956d4d4e51395cd537c8d552b698c0ce2bf7ff70f6a168cd395e

See more details on using hashes here.

File details

Details for the file concepCy-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: concepCy-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.9.10 Darwin/21.4.0

File hashes

Hashes for concepCy-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 426bd6cb0e0cbf4c00e86c94cc5418229a1c6a288ade754c4bceb821c3c6f96a
MD5 23f08c44f07cafff9996087da52cf13f
BLAKE2b-256 37b3fc44ce88297903e345f54e4550ba2d8ab35bd3beec576ebea298f4b8538a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page