# WikiBot

Welcome to WikiBot! This is a small program to get a random page from a Wikipedia category AND it's subcategories (up to a specified depth).

# Installation

All you need to do it clone this repo and install the dependencies. Make sure you have Pip installed!

git clone https://github.com/ddxtanx/wikiBot
cd wikiBot
pip install -r


OR

pip install wikiBot


To use as an API

# Usage

python wikiBot.py -h shows the usage of the program.

usage: wikiBot.py [-h] [--tree_depth [TREE_DEPTH]] [--similarity [SIMILARITY]]
[-s] [-r] [-v] [-c]
category

Get a random page from a wikipedia category

positional arguments:
category              The category you wish to get a page from.

optional arguments:
-h, --help            show this help message and exit
--tree_depth [TREE_DEPTH]
How far down to traverse the subcategory tree
--similarity [SIMILARITY]
What percent of page categories need to be in
subcategory array. Must be used with -c/--check
-s, --save            Save subcategories to a file for quick re-runs
-r, --regen           Regenerate the subcategory file
-v, --verbose         Print debug lines
-c, --check           After finding page check to see that it truly fits in
category


Pro Tips:

• Use a tree_depth of 3 or 4, more than 4 will bring loosely relates categories into subcategories.
• Use a similarity of .25 or .33. If you want a higher similarity value then you might sacrifice other valid pages in search for the PERFECT page.

If you're using it in your own Python code the best way to set it up is

from wikiBot import WikiBot

"""
...
...
"""

randomPage = wb.randomPage(category,...)


You can also change the tree depth and similarity_val by using wb.td = {{ New Tree Depth}} and wb.sv = {{ New Similarity Val}}

More info available by using help(wikiBot)

# How It Works

The most important part of this program is the Wikipedia API; it allows the program to gather all of the subcategories of a given category in a fast(ish) and usable manner, and to get the pages belonging to a given category. The bulk of my code focuses on iteratively getting the subcategories at a given depth in a tree, adding them to an array with all subcategories of a given 'parent' category, and continuing on in that fashion until there are no more subcategories or the program has fetched to the maximum tree depth allowed. i.e. if a subcategory chain went

Category A -> Category B -> Category C -> Category D -> ...

(-> denotes 'is a supercategory of')

and the maximum tree depth was 3, then the code would stop gathering subcategories for Category C,D,E...

After all subcategories of a given parent category have been amassed in some list L, the program randomly chooses a category C from L, finds the pages belonging to C, chooses a random page P from C and return the URL pointing to P. For speeds sake, after gathering all subcategories from a given parent category the program optionally saves all of them to a text file to find subcategories faster.

To determine how similar a page is to a category, the program first enumerates what categories the page selected belongs to. Then it loops through all of the found categories using a variable I will call A here. It then checks if A belongs to the subcategories generated by the 'parent' category, and computes a 'score' of that page. If it is >= than a prespecified value (Default is .5: half of all A's should be subcategories of parent category) then it is a valid subpage. If not, it removes that page from the category list and loops on.

# Note on types

This project uses type annotations and mypy type checking, so you can be sure you are passing the right types to functions. If you're using Atom to edit your code, I recommend using atom-linter-mypy to do type linting. Have fun!

# Contributions

I'm open to anyone contributing, especially if they know of a way to make this faster or take up less drive space for locally stored subcategories. Email me at gcc@ameritech.net and we can talk stuff out.

## Project details

Uploaded source
Uploaded py2 py3