Convenience functions to work with pandas triple dataframes 🐼🐼🐼
Project description
Convenience functions for pandas dataframes containing triples. Fun fact: a group of pandas (e.g. three) is commonly referred to as an embarrassment.
This library's main focus is to easily make commonly used functions available, when exploring triples stored in pandas dataframes. It is not meant to be an efficient graph analysis library.
Usage
You can use a variety of convenience functions, let's create some simple example triples:
>>> import pandas as pd
>>> rel = pd.DataFrame([("e1","rel1","e2"), ("e3", "rel2", "e1")], columns=["head","relation","tail"])
>>> attr = pd.DataFrame([("e1","attr1","lorem ipsum"), ("e2","attr2","dolor")], columns=["head","relation","tail"])
Search in attribute triples:
>>> from embarrassment import search
>>> search(attr, "lorem ipsum")
head relation tail
0 e1 attr1 lorem ipsum
>>> search(attr, "lorem", method="substring")
head relation tail
0 e1 attr1 lorem ipsum
Select triples with a specific relation:
>>> from embarrassment import select_rel
>>> select_rel(rel, "rel1")
head relation tail
0 e1 rel1 e2
Perform operations on the immediate neighbor(s) of an entity, e.g. get the attribute triples:
>>> from embarrassment import neighbor_attr_triples
>>> neighbor_attr_triples(rel, attr, "e1")
head relation tail
1 e2 attr2 dolor
Or just get the triples:
>>> from embarrassment import neighbor_rel_triples
>>> neighbor_rel_triples(rel, "e1")
head relation tail
1 e3 rel2 e1
0 e1 rel1 e2
By default you get in- and out-links, but you can specify a direction:
>>> neighbor_rel_triples(rel, "e1", in_out_both="in")
head relation tail
1 e3 rel2 e1
>>> neighbor_rel_triples(rel, "e1", in_out_both="out")
head relation tail
0 e1 rel1 e2
Using pandas' pipe operator you can chain operations. Let's see a more elaborate example by loading a dataset from sylloge:
>>> from sylloge import MovieGraphBenchmark
>>> from embarrassment import clean, neighbor_attr_triples, search, select_rel
>>> ds = MovieGraphBenchmark()
>>> # clean attribute triples
>>> cleaned_attr = clean(ds.attr_triples_left)
>>> # find uri of James Tolkan
>>> jt = search(cleaned_attr, query="James Tolkan")["head"].iloc[0]
>>> # get neighbor triples
>>> # and select triples with title and show values
>>> title_rel = "https://www.scads.de/movieBenchmark/ontology/title"
>>> neighbor_attr_triples(ds.rel_triples_left, cleaned_attr, jt).pipe(
select_rel, rel=title_rel
)["tail"]
)
12234 A Nero Wolfe Mystery
12282 Door to Death
12440 Die Like a Dog
12461 The Next Witness
Name: tail, dtype: object
Installation
You can install embarrassment
via pip:
pip install embarrassment
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for embarrassment-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 13ededfcd4d4da50ff7f0a4c9d0851b9f26b6b90282b0b02d66fe4289fd0f08b |
|
MD5 | 67b9ef186970fa7ebc03a07db7527e14 |
|
BLAKE2b-256 | 5050c05f1f1c465a7092d2976321355574608af88a39f030409cd8f8dc240cd9 |