Skip to main content

A multiprocessing web-scraping application to scrape wiki pages and find minimum number of links between two given wiki pages.

Project description


wikilink is a multiprocessing web-scraping application to scrape the wiki pages, extract urls and find the minimum number of links between 2 given wiki pages.

I discussed brief the motivation and an overview of the project in my blog.

The project is currently at version v0.3.0.post1, also see change log for more details on release history.

If you like this project, feel fee to leave a few words of appreciation here Say Thanks!

Build Build Status Coverage Status
Quality Maintainability Requirements Status
Support Join the chat blog
Platform python version implementation

Table of contents

  1. Usage
  2. Contribution
  3. License

Usage

Install with pip

$ pip install wikilink

Database support

wikilink currently supports Mysql and PostgreSQL

API

setup_db(db, username, password, ip="127.0.0.1", port=3306): set up database

Args:
	db(str): Database engine, currently support "mysql" and "postgresql"
	name(str): database username
	password(str): database password
	ip(str): IP address of database (Default = "127.0.0.1")
	port(str): port that databse is running on (default=3306)

Returns:
	None
min_link(source, destination, limit=6, multiprocessing=False): find minimum number of link from source url to destination url within limit 

Args:
	source(str): source wiki url, i.e. "https://en.wikipedia.org/wiki/Cristiano_Ronaldo"
	destination(str): Destination wiki url, i.e. "https://en.wikipedia.org/wiki/Cristiano_Ronaldo"
	limit(int): max number of links from the source that will be considered (default=6)
	multiprocessing(boolean): enable/disable multiprocessing mode (default=False)

Returns:
	(int) minimum number of sepration between source and destination urls
	return None and print messages if exceeding limits or no path found

Raises:
	DisconnectionError: error connecting to DB

Examples

>>> from wikilink import WikiLink
>>> app = WikiLink()
>>> app.setup_db("mysql", "root", "12345", "127.0.0.1", "3306")
>>> source = "https://en.wikipedia.org/wiki/Cristiano_Ronaldo"
>>> destination = "https://en.wikipedia.org/wiki/Lionel_Messi"
>>> app.min_link(source, destination, 6)
1

Contribution Open Source Helpers

How to contribute

Please follow our contribution convention at contribution instructions and code of conduct.

To set up development environment, simply run:

$ pip install -r requirements.txt

Please check out the issue file for list of issues that required helps.

Appreciation

Feel free to add your name into the list of contributors. You will automatically be inducted into Hall of Fame as a way to show my appreciation for your contributions.

Hall of Fame


License

See the LICENSE file for license rights and limitations (Apache License 2.0).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for wikilink, version 0.3.0.post1
Filename, size File type Python version Upload date Hashes
Filename, size wikilink-0.3.0.post1-py3-none-any.whl (16.9 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size wikilink-0.3.0.post1.tar.gz (18.4 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page