Skip to main content

Knowledge Base loading and annotation facilities

Project description

Simple Statement Knowledge Bases (SSKB)

Knowledge Base loading and annotation facilities

The sskb library provides easy access to Natural Language Knowledge Bases (KBs), and tools to facilitate annotation.

It exposes available KBs as sequences of simple statements. For example (from ProofWiki):

"A '''set''' is intuitively defined as any aggregation of 
objects, called elements, which can be precisely defined in 
some way or other."

Each statement is accompanied of relevant metadata, in the form of premises necessary for the statement to be true, and named entities associated with the respective KB.

SSKB is built upon the Simple Annotation Framework (SAF) library, which provides its data model and API. This means it is compatible with saf-datasets annotators.

Installation

To install, you can use pip:

pip install sskb

Usage

Loading KBs and accessing data

from sskb import ProofWikiKB

kb = ProofWikiKB()
print(len(kb))  # Number of statements in the KB
# 146723

print(kb[0].surface)  # First statement in the KB
# A '''set''' is intuitively defined as any aggregation of objects, called elements, which can be precisely defined in some way or other.

print([token.surface for token in kb[0].tokens])  # Tokens (SpaCy) of the first statement.
# ['A', "''", "'", 'set', "''", "'", 'is', 'intuitively', 'defined', 'as', 'any', 'aggregation', 'of', 'objects', ',', 'called', 'elements', ',', 'which', 'can', 'be', 'precisely', 'defined', 'in', 'some', 'way', 'or', 'other', '.']


print(kb[0].annotations)  # Annotations for the first sentence
# {'split': 'KB', 'type': 'fact', 'id': 337113631216859490898241823584484375642}


# There are no token annotations in this dataset
print([(tok.surface, tok.annotations) for tok in kb[0].tokens])
# [('A', {}), ("''", {}), ("'", {}), ('set', {}), ("''", {}), ("'", {}), ('is', {}), ('intuitively', {}), ('defined', {}), ('as', {}), ('any', {}), ('aggregation', {}), ('of', {}), ('objects', {}), (',', {}), ('called', {}), ('elements', {}), (',', {}), ('which', {}), ('can', {}), ('be', {}), ('precisely', {}), ('defined', {}), ('in', {}), ('some', {}), ('way', {}), ('or', {}), ('other', {}), ('.', {})]

# Entities cited in a statement
print([entity.surface for entity in kb[0].entities])
# ['Set', 'Or', 'Aggregation']

# Accessing statements by KB identifier
set_related = kb[337113631216859490898241823584484375642] # All statements connected to this identifier

print(len(set_related))
# 40

print(set_related[10].surface)
# If there are many elements in a set, then it becomes tedious and impractical to list them all in one big long explicit definition. Fortunately, however, there are other techniques for listing sets.

# Filtering ProofWiki propositions
train_propositions = [stt for stt in kb 
                      if (stt.annotations["type"] == "proposition" and stt.annotations["split"] == "train")]

print( train_propositions[0].surface)
# Let $A$ be a preadditive category.

print("\n".join([prem.surface for prem in train_propositions[0].premises]))
# Let $\mathbf C$ be a metacategory.
# Let $A$ and $B$ be objects of $\mathbf C$.
# A '''(binary) product diagram''' for $A$ and $B$ comprises an object $P$ and morphisms $p_1: P \to A$, $p_2: P \to B$:
# ::$\begin{xy}\xymatrix@+1em@L+3px{
#  A
# &
#  P
#   \ar[l]_*+{p_1}
#   \ar[r]^*+{p_2}
# &
#  B
# }\end{xy}$
# subjected to the following universal mapping property:
# :For any object $X$ and morphisms $x_1, x_2$ like so:
# ::$\begin{xy}\xymatrix@+1em@L+3px{
#  A
# &
#  X
#   \ar[l]_*+{x_1}
#   \ar[r]^*+{x_2}
# &
#  B
# }\end{xy}$
# :there is a unique morphism $u: X \to P$ such that:
# ::$\begin{xy}\xymatrix@+1em@L+3px{
# &
#  X
#   \ar[ld]_*+{x_1}
#   \ar@{-->}[d]^*+{u}
#   \ar[rd]^*+{x_2}
# \\
#  A
# &
#  P
#   \ar[l]^*+{p_1}
#   \ar[r]_*+{p_2}
# &
#  B
# }\end{xy}$
# :is a commutative diagram, i.e., $x_1 = p_1 \circ u$ and $x_2 = p_2 \circ u$.
# In this situation, $P$ is called a '''(binary) product of $A$ and $B$''' and may be denoted $A \times B$.
# Generally, one writes $\left\langle{x_1, x_2}\right\rangle$ for the unique morphism $u$ determined by above diagram.
# The morphisms $p_1$ and $p_2$ are often taken to be implicit.
# They are called '''projections'''; if necessary, $p_1$ can be called the '''first projection''' and $p_2$ the '''second projection'''.
# {{expand|the projection definition may merit its own, separate page}}

Available datasets: e-SNLI (ESNLIKB), ProofWiki (ProofWikiKB), WorldTree (WorldTreeKB).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sskb-0.1.2.tar.gz (9.8 kB view details)

Uploaded Source

Built Distribution

sskb-0.1.2-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file sskb-0.1.2.tar.gz.

File metadata

  • Download URL: sskb-0.1.2.tar.gz
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for sskb-0.1.2.tar.gz
Algorithm Hash digest
SHA256 ecb3c9a070d8dee3cbd783549a3d74ea78cf3b1f62f1d53fc2f07fec022c4eca
MD5 0780d8b26149cb2726e37c18899a7cb6
BLAKE2b-256 23c80c40e693f2fb41aea4fc7d398146a05c41b5ab6cb8ae948661078c74e69c

See more details on using hashes here.

File details

Details for the file sskb-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: sskb-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for sskb-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 46899901c5e0b795ed671fea119cb6f6ba5426cd732d2bac431abb2cb9be9965
MD5 1749c90f223d22a22b1efba17a6a9622
BLAKE2b-256 f6f104528f3283ca97c5747784a854d7b61f8eea5eca69373c732d57cdac89c3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page