Skip to main content

A random wikipedia page generator.

Project description

WikiBot

Welcome to WikiBot! This is a small program to get a random page from a Wikipedia category AND it's subcategories (up to a specified depth).

Installation

All you need to do it clone this repo and install the dependencies. Make sure you have Pip installed!

git clone https://github.com/ddxtanx/wikiBot
cd wikiBot
pip install -r

OR

pip install wikiBot

To use as an API

Usage

python wikiBot.py -h shows the usage of the program.

usage: wikiBot.py [-h] [--tree_depth [TREE_DEPTH]] [--similarity [SIMILARITY]]
                  [-s] [-r] [-v] [-c]
                  category

Get a random page from a wikipedia category

positional arguments:
  category              The category you wish to get a page from.


optional arguments:
  -h, --help            show this help message and exit
  --tree_depth [TREE_DEPTH]
                        How far down to traverse the subcategory tree
  --similarity [SIMILARITY]
                        What percent of page categories need to be in
                        subcategory array. Must be used with -c/--check
  -s, --save            Save subcategories to a file for quick re-runs
  -r, --regen           Regenerate the subcategory file
  -v, --verbose         Print debug lines
  -c, --check           After finding page check to see that it truly fits in
                        category

Pro Tips:

  • Use a tree_depth of 3 or 4, more than 4 will bring loosely relates categories into subcategories.
  • Use a similarity of .25 or .33. If you want a higher similarity value then you might sacrifice other valid pages in search for the PERFECT page.

If you're using it in your own Python code the best way to set it up is

from wikiBot import WikiBot
wb = WikiBot({{Your preferred tree_depth}}, {{Your preferred similarity_val}})

"""
...
Your Awesome Code
...
"""

randomPage = wb.randomPage(category,...)

You can also change the tree depth and similarity_val by using wb.td = {{ New Tree Depth}} and wb.sv = {{ New Similarity Val}}

More info available by using help(wikiBot)

How It Works

The most important part of this program is the Wikipedia API; it allows the program to gather all of the subcategories of a given category in a fast(ish) and usable manner, and to get the pages belonging to a given category. The bulk of my code focuses on iteratively getting the subcategories at a given depth in a tree, adding them to an array with all subcategories of a given 'parent' category, and continuing on in that fashion until there are no more subcategories or the program has fetched to the maximum tree depth allowed. i.e. if a subcategory chain went

Category A -> Category B -> Category C -> Category D -> ...

(-> denotes 'is a supercategory of')

and the maximum tree depth was 3, then the code would stop gathering subcategories for Category C,D,E...

After all subcategories of a given parent category have been amassed in some list L, the program randomly chooses a category C from L, finds the pages belonging to C, chooses a random page P from C and return the URL pointing to P. For speeds sake, after gathering all subcategories from a given parent category the program optionally saves all of them to a text file to find subcategories faster.

To determine how similar a page is to a category, the program first enumerates what categories the page selected belongs to. Then it loops through all of the found categories using a variable I will call A here. It then checks if A belongs to the subcategories generated by the 'parent' category, and computes a 'score' of that page. If it is >= than a prespecified value (Default is .5: half of all A's should be subcategories of parent category) then it is a valid subpage. If not, it removes that page from the category list and loops on.

Note on types

This project uses type annotations and mypy type checking, so you can be sure you are passing the right types to functions. If you're using Atom to edit your code, I recommend using atom-linter-mypy to do type linting. Have fun!

Contributions

I'm open to anyone contributing, especially if they know of a way to make this faster or take up less drive space for locally stored subcategories. Email me at gcc@ameritech.net and we can talk stuff out.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wiki_bot-1.3.1.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

wiki_bot-1.3.1-py2.py3-none-any.whl (6.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file wiki_bot-1.3.1.tar.gz.

File metadata

  • Download URL: wiki_bot-1.3.1.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for wiki_bot-1.3.1.tar.gz
Algorithm Hash digest
SHA256 ad03ba8265f3a5a425a09cf705b8a2490d554b2e9ab9d93a1345b80bf73c1b16
MD5 e5923cd943743e02ddd87df8b6bf8e4a
BLAKE2b-256 79e6dc55ef257c28d503e1606ebc3a763e595688a8fd0b03b2b4761d371d7da3

See more details on using hashes here.

File details

Details for the file wiki_bot-1.3.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for wiki_bot-1.3.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 4de1e97143dd99c5877273d660070fd2224b1f1a291a53202419209dfe4272e3
MD5 1b6d31ffc1f2a620ef9f8f7ec1cf4c1f
BLAKE2b-256 8700a76d9f71331210e1067a15de815e8ca4869e7bfabf1aa91a4b348db85c3b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page