No project description provided
Project description
wikimine
Table of Contents
Installation
pip install wikimine
Motivation
wikidata contains lots of knowledge modeled by a very powerful graph structure.
Its data structure is powerful and enable lots of applications, but it also has a steep learning curve for most programmers.
To be able to use wikidata, a programmer need to understand its
It also relies on triplestore/graph database and a new query language sparql,
both have limited learning resource currently.
This project translate wikidata into a data modeling format that's more familiar to most developers,
and use only sqlite, removing the need to setting up any new database system.
As a result,
our approach lose some functionality and usefulness of wikidata's original design,
but are more familiar to most developers while still provide enough usefulness of wikidata.
While developers can explore wikidata using a more familiar mindset with familiar tools.
We hope wikimine can serve as a gateway for wikidata, graph database and semantic web,
and allow more people contribute to those related projects.
Data Modeling
wikimine contains the following tables (using peewee ORM):
class WikidataEntityLabel(BaseModel):
"""
wikidata entity label
"""
entity_id = CharField()
language = CharField()
value = CharField()
class WikidataEntityDescriptions(BaseModel):
"""
wikidata entity descriptions.
example: (Q9191, 'en', 'René Descartes')
"""
entity_id = CharField()
language = CharField()
value = CharField()
class WikidataEntityAliases(BaseModel):
"""
wikidata entity aliases.
example:
(Q9191, 'en', 'Descartes')
(Q9191, 'en', 'Cartesius')
"""
entity_id = CharField()
language = CharField()
value = CharField()
class WikidataClaim(BaseModel):
"""
wikidata claim contains Statements about wikidata items.
(item, property, value).
You can read more about this concept here:
https://www.wikidata.org/wiki/Wikidata:Introduction
This table is indexed by
(source_entity, property_id, target_entity).
(property_id, target_entity).
(target_entity).
"""
source_entity = CharField() # entity id of item.
property_id = CharField()
body = JSONField() # this is the claim body.
target_entity = CharField(null=True) # this is only true if mainsnak.datavalue is wikibase-entityid
Usage
Process the wikidata json dump
After download the dump
# first split the dump into smaller pieces for easier processing.
python -m wikimine.cli split ./path-to-dump ./path-to-workspace-folder
# parse and import to sqlite.
python -m wikimine.cli import /path/to/db ./path-to-workspace-folder
# build indices.
python -m wikimine.cli index /path/to/db
Connect to db
from wikimine import auto_connect, connect
"""
Search for database path from the following source and connect to it automatically.
1. from environment variable [WIKIMINE_WIKIDATA_DB].
2. ~/.wikimine.config.json: {"db_path": "/path/to/db"}
"""
auto_connect()
# or
connect('/path/to/db')
Label and Link lookup
from wikimine import lookup_label, lookup_wikilink
import wikimine.relations as rel
import wikimine.entity as ent
print(ent.People.Descartes)
print(lookup_label(ent.People.Descartes))
print(lookup_wikilink(ent.People.Descartes))
print(lookup_label(rel.People.lang_written))
Other commonly used entity and relations
from wikimine.utils import list_static_class_members
import wikimine.relations as rel
import wikimine.entity as ent
print('class People:')
for k, v in list_static_class_members(ent.People):
print(f' {k}: {v}')
print()
print('class Location:')
for k, v in list_static_class_members(ent.Location):
print(f' {k}: {v}')
print()
print('class Company:')
for k, v in list_static_class_members(ent.Company):
print(f' {k}: {v}')
print()
print('class WrittenWorks:')
for k, v in list_static_class_members(ent.WrittenWork):
print(f' {k}: {v}')
print()
Query the knowledge graph
from wikimine.query import \
list_instances_of, \
get_common_classes, \
get_common_edges, \
get_classes_of_instance, \
get_profile, \
get_tree
import wikimine.relations as rel
import wikimine.entity as ent
import pprint
# list first 50 people
print("List first 50 people")
people = list_instances_of(ent.CommonTypes.Human, limit=50)
pprint.pp(people)
print('\n ----- \n')
# get common type of instances
print("Get common type of instances")
common_classes = get_common_classes([
ent.Company.VW,
ent.Company.Xerox,
ent.Company.Apple,
])
common_classes.print_summary()
print('\n ----- \n')
print('List commonly existed outgoing relations of a group of entity')
common_edges = get_common_edges(people)
common_edges.print_summary()
print('\n ----- \n')
print('List all classes of a given entity')
classes = get_classes_of_instance(ent.Company.Apple)
pprint.pp(classes)
print('\n ----- \n')
print('Get all outgoing edge and its value of a given entity')
profile = get_profile(ent.WrittenWork.A_Mathematical_Theory_of_Communication)
pprint.pp(profile)
print('\n ----- \n')
print('Show all types that are has human as a subtype recursively')
tree = get_tree(ent.CommonTypes.Human, rel.TypingRelations.subclass_of)
tree.show()
print('\n ----- \n')
print('Show all types that are sub types of human recursively')
tree = get_tree(ent.CommonTypes.Human, rel.TypingRelations.subclass_of, direction='backward')
tree.show()
License
wikimine is distributed under the terms of the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wikimine-0.0.6.tar.gz.
File metadata
- Download URL: wikimine-0.0.6.tar.gz
- Upload date:
- Size: 16.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.27.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ff3c11558cfe697481bb2fc6737d513d4276a28819d6723a78d9821c43aaacd
|
|
| MD5 |
abae7bcd5c846e763506282726490995
|
|
| BLAKE2b-256 |
bf8062d00003b54693b5f800054bd5ed549cac4096326ef07b93f98481c6374e
|
File details
Details for the file wikimine-0.0.6-py3-none-any.whl.
File metadata
- Download URL: wikimine-0.0.6-py3-none-any.whl
- Upload date:
- Size: 16.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.27.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d3822f24673c259f6c2fcf03c7870df913f48a9d0a6eea76c59d15b52f38a59
|
|
| MD5 |
e6115d81acf1446c8bee5ceeae14db89
|
|
| BLAKE2b-256 |
2ef6c1c62af4a312448f84a431a3b66a2f9118b6151c2854fc9ea6e7c0275bcb
|