Skip to main content

No project description provided

Project description

wikimine

PyPI - Version PyPI - Python Version


Table of Contents

Installation

pip install wikimine

Motivation

wikidata contains lots of knowledge modeled by a very powerful graph structure.

Its data structure is powerful and enable lots of applications, but it also has a steep learning curve for most programmers.

To be able to use wikidata, a programmer need to understand its

It also relies on triplestore/graph database and a new query language sparql, both have limited learning resource currently.

This project translate wikidata into a data modeling format that's more familiar to most developers, and use only sqlite, removing the need to setting up any new database system.

As a result, our approach lose some functionality and usefulness of wikidata's original design, but are more familiar to most developers while still provide enough usefulness of wikidata.

While developers can explore wikidata using a more familiar mindset with familiar tools. We hope wikimine can serve as a gateway for wikidata, graph database and semantic web, and allow more people contribute to those related projects.

Data Modeling

wikimine contains the following tables (using peewee ORM):

class WikidataEntityLabel(BaseModel):
    """
    wikidata entity label
    """
    entity_id = CharField()
    language = CharField()
    value = CharField()


class WikidataEntityDescriptions(BaseModel):
    """
    wikidata entity descriptions.
    example: (Q9191, 'en', 'René Descartes')
    """
    entity_id = CharField()
    language = CharField()
    value = CharField()


class WikidataEntityAliases(BaseModel):
    """
    wikidata entity aliases.
    example:
    (Q9191, 'en', 'Descartes')
    (Q9191, 'en', 'Cartesius')
    """
    entity_id = CharField()
    language = CharField()
    value = CharField()


class WikidataClaim(BaseModel):
    """
    wikidata claim contains Statements about wikidata items.
    (item, property, value).

    You can read more about this concept here:
    https://www.wikidata.org/wiki/Wikidata:Introduction

    This table is indexed by
        (source_entity, property_id, target_entity).
        (property_id, target_entity).
        (target_entity).
    """
    source_entity = CharField()  # entity id of item.
    property_id = CharField()
    body = JSONField()  # this is the claim body.
    target_entity = CharField(null=True)  # this is only true if mainsnak.datavalue is wikibase-entityid

Usage

Process the wikidata json dump

After download the dump

# first split the dump into smaller pieces for easier processing.
python -m wikimine.cli split ./path-to-dump ./path-to-workspace-folder
# parse and import to sqlite.
python -m wikimine.cli import /path/to/db ./path-to-workspace-folder
# build indices.
python -m wikimine.cli index /path/to/db

Connect to db

from wikimine import auto_connect, connect
"""
    Search for database path from the following source and connect to it automatically.
    1.  from environment variable [WIKIMINE_WIKIDATA_DB].
    2.  ~/.wikimine.config.json: {"db_path": "/path/to/db"}
"""
auto_connect()
# or
connect('/path/to/db')

Label and Link lookup

from wikimine import lookup_label, lookup_wikilink
import wikimine.relations as rel
import wikimine.entity as ent

print(ent.People.Descartes)
print(lookup_label(ent.People.Descartes))
print(lookup_wikilink(ent.People.Descartes))

print(lookup_label(rel.People.lang_written))

Other commonly used entity and relations

from wikimine.utils import list_static_class_members
import wikimine.relations as rel
import wikimine.entity as ent

print('class People:')
for k, v in list_static_class_members(ent.People):
    print(f'  {k}: {v}')
print()

print('class Location:')
for k, v in list_static_class_members(ent.Location):
    print(f'  {k}: {v}')
print()

print('class Company:')
for k, v in list_static_class_members(ent.Company):
    print(f'  {k}: {v}')
print()

print('class WrittenWorks:')
for k, v in list_static_class_members(ent.WrittenWork):
    print(f'  {k}: {v}')
print()

Query the knowledge graph

from wikimine.query import \
    list_instances_of, \
    get_common_classes, \
    get_common_edges, \
    get_classes_of_instance, \
    get_profile, \
    get_tree 

import wikimine.relations as rel
import wikimine.entity as ent
import pprint

# list first 50 people
print("List first 50 people")
people = list_instances_of(ent.CommonTypes.Human, limit=50)
pprint.pp(people)
print('\n ----- \n')

# get common type of instances
print("Get common type of instances")
common_classes = get_common_classes([
    ent.Company.VW,
    ent.Company.Xerox,
    ent.Company.Apple,
])
common_classes.print_summary()
print('\n ----- \n')

print('List commonly existed outgoing relations of a group of entity')
common_edges = get_common_edges(people)
common_edges.print_summary()
print('\n ----- \n')

print('List all classes of a given entity')
classes = get_classes_of_instance(ent.Company.Apple)
pprint.pp(classes)
print('\n ----- \n')

print('Get all outgoing edge and its value of a given entity')
profile = get_profile(ent.WrittenWork.A_Mathematical_Theory_of_Communication)
pprint.pp(profile)
print('\n ----- \n')


print('Show all types that are has human as a subtype recursively')
tree = get_tree(ent.CommonTypes.Human, rel.TypingRelations.subclass_of)
tree.show()
print('\n ----- \n')

print('Show all types that are sub types of human recursively')
tree = get_tree(ent.CommonTypes.Human, rel.TypingRelations.subclass_of, direction='backward')
tree.show()

License

wikimine is distributed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikimine-0.0.6.tar.gz (16.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wikimine-0.0.6-py3-none-any.whl (16.1 kB view details)

Uploaded Python 3

File details

Details for the file wikimine-0.0.6.tar.gz.

File metadata

  • Download URL: wikimine-0.0.6.tar.gz
  • Upload date:
  • Size: 16.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.27.0

File hashes

Hashes for wikimine-0.0.6.tar.gz
Algorithm Hash digest
SHA256 9ff3c11558cfe697481bb2fc6737d513d4276a28819d6723a78d9821c43aaacd
MD5 abae7bcd5c846e763506282726490995
BLAKE2b-256 bf8062d00003b54693b5f800054bd5ed549cac4096326ef07b93f98481c6374e

See more details on using hashes here.

File details

Details for the file wikimine-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: wikimine-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 16.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.27.0

File hashes

Hashes for wikimine-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 7d3822f24673c259f6c2fcf03c7870df913f48a9d0a6eea76c59d15b52f38a59
MD5 e6115d81acf1446c8bee5ceeae14db89
BLAKE2b-256 2ef6c1c62af4a312448f84a431a3b66a2f9118b6151c2854fc9ea6e7c0275bcb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page