Skip to main content

No project description provided

Project description

wikimine

PyPI - Version PyPI - Python Version


Table of Contents

Installation

pip install wikimine

Motivation

wikidata contains a vast amount of knowledge structured as a graph.

Its data model enables a wide range of applications, allowing relationships between entities to be stored and queried in a highly flexible way.

However, this flexibility comes with a steep learning curve for most programmers.

To effectively use wikidata, a developer must understand its unique structure, along with triplestore/graph databases and the SPARQL query language—both of which currently have limited learning resources.

This project translates wikidata into a more familiar data modeling format and uses only SQLite, eliminating the need to set up a specialized database system.

As a tradeoff, our approach sacrifices some of wikidata's original flexibility and expressiveness, which make it so powerful for large-scale knowledge representation.

However, it makes the data more accessible to developers while retaining much of its usefulness.

With this, developers can explore wikidata using familiar tools and workflows.

We hope wikimine serves as a gateway to wikidata, graph databases, and the semantic web, encouraging more people to contribute to these ecosystems.

Data Modeling

wikimine contains the following tables (using peewee ORM):

class WikidataEntityLabel(BaseModel):
    """
    wikidata entity label
    """
    entity_id = CharField()
    language = CharField()
    value = CharField()


class WikidataEntityDescriptions(BaseModel):
    """
    wikidata entity descriptions.
    example: (Q9191, 'en', 'René Descartes')
    """
    entity_id = CharField()
    language = CharField()
    value = CharField()


class WikidataEntityAliases(BaseModel):
    """
    wikidata entity aliases.
    example:
    (Q9191, 'en', 'Descartes')
    (Q9191, 'en', 'Cartesius')
    """
    entity_id = CharField()
    language = CharField()
    value = CharField()


class WikidataClaim(BaseModel):
    """
    wikidata claim contains Statements about wikidata items.
    (item, property, value).

    You can read more about this concept here:
    https://www.wikidata.org/wiki/Wikidata:Introduction

    This table is indexed by
        (source_entity, property_id, target_entity).
        (property_id, target_entity).
        (target_entity).
    """
    source_entity = CharField()  # entity id of item.
    property_id = CharField()
    body = JSONField()  # this is the claim body.
    target_entity = CharField(null=True)  # this is only true if mainsnak.datavalue is wikibase-entityid

Usage

Process the wikidata json dump

After download the dump

# first split the dump into smaller pieces for easier processing.
python -m wikimine.cli split ./path-to-dump ./path-to-workspace-folder
# parse and import to sqlite.
python -m wikimine.cli import /path/to/db ./path-to-workspace-folder
# build indices.
python -m wikimine.cli index /path/to/db

Connect to db

from wikimine import auto_connect, connect

"""
    Search for database path from the following source and connect to it automatically.
    1.  from environment variable [WIKIMINE_WIKIDATA_DB].
    2.  ~/.wikimine.config.json: {"db_path": "/path/to/db"}
"""
auto_connect()
# or
connect('/path/to/db')

Label and Link lookup

from wikimine import lookup_label, lookup_wikilink
import wikimine.relations as rel
import wikimine.entity as ent

print(ent.People.Descartes)
print(lookup_label(ent.People.Descartes))
print(lookup_wikilink(ent.People.Descartes))

print(lookup_label(rel.People.lang_written))

Other commonly used entity and relations

from wikimine.utils import list_static_class_members
import wikimine.relations as rel
import wikimine.entity as ent

print('class People:')
for k, v in list_static_class_members(ent.People):
    print(f'  {k}: {v}')
print()

print('class Location:')
for k, v in list_static_class_members(ent.Location):
    print(f'  {k}: {v}')
print()

print('class Company:')
for k, v in list_static_class_members(ent.Company):
    print(f'  {k}: {v}')
print()

print('class WrittenWorks:')
for k, v in list_static_class_members(ent.WrittenWork):
    print(f'  {k}: {v}')
print()

Query the knowledge graph

from wikimine.query import

list_instances_of,
get_common_classes,
get_common_edges,
get_classes_of_instance,
get_profile,
get_tree

import wikimine.relations as rel
import wikimine.entity as ent
import pprint

# list first 50 people
print("List first 50 people")
people = list_instances_of(ent.CommonTypes.Human, limit=50)
pprint.pp(people)
print('\n ----- \n')

# get common type of instances
print("Get common type of instances")
common_classes = get_common_classes([
    ent.Company.VW,
    ent.Company.Xerox,
    ent.Company.Apple,
])
common_classes.print_summary()
print('\n ----- \n')

print('List commonly existed outgoing relations of a group of entity')
common_edges = get_common_edges(people)
common_edges.print_summary()
print('\n ----- \n')

print('List all classes of a given entity')
classes = get_classes_of_instance(ent.Company.Apple)
pprint.pp(classes)
print('\n ----- \n')

print('Get all outgoing edge and its value of a given entity')
profile = get_profile(ent.WrittenWork.A_Mathematical_Theory_of_Communication)
pprint.pp(profile)
print('\n ----- \n')

print('Show all types that are has human as a subtype recursively')
tree = get_tree(ent.CommonTypes.Human, rel.TypingRelations.subclass_of)
tree.show()
print('\n ----- \n')

print('Show all types that are sub types of human recursively')
tree = get_tree(ent.CommonTypes.Human, rel.TypingRelations.subclass_of, direction='backward')
tree.show()

License

wikimine is distributed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikimine-0.0.8.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wikimine-0.0.8-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file wikimine-0.0.8.tar.gz.

File metadata

  • Download URL: wikimine-0.0.8.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.28.1

File hashes

Hashes for wikimine-0.0.8.tar.gz
Algorithm Hash digest
SHA256 a1b9d3844bcf7100762646f292b739b2b5bb55b84367fd103c43b2e068153f25
MD5 5c8a7309230c93a3d2fe9f0269b017e2
BLAKE2b-256 42fcdb76454f8043e3978e0e3f5326437d7eb8eafee215478393d4e6f27de93d

See more details on using hashes here.

File details

Details for the file wikimine-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: wikimine-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 16.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.28.1

File hashes

Hashes for wikimine-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 77cc04f11f2d9a016d6d4e39deb03ff9ae5945ee5560239b19845ee4e2d967e1
MD5 61d5fbb51feed2e14743ef924bab7e41
BLAKE2b-256 e2387c663daf5e3498605452f50242da5a3b0d4fda403b4a202178efb9ce4e5a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page