Skip to main content

A simple asynchronous and synchronous Wikipedia scraper.

Project description

Scrapewiki - Wikipedia Scraper

It can scrape Wikipedia synchronously and asynchronously. scrapewiki.Scrapewiki has two methods, search and wiki.

wiki

It is used to scrape a Wikipedia page.

search

It is used to search some query on Wikipedia. limit parameter can be optionally specified to set a limit to the amount of results.

Examples

Asynchronous:

import scrapewiki
import trio


wiki = scrapewiki.Scrapewiki()


async def main():
    async with wiki.search("python") as results:
        async for search_result in results:
            ...

    # equivalent of

    searcher = wiki.search("python")
    results = await searcher.async_method()

trio.run(main)
import scrapewiki
import trio


wiki = scrapewiki.Scrapewiki()


async def main():
    async with wiki.wiki("python", limit=45) as page:
        ...

    # equivalent of

    page_scraper = wiki.wiki("python")
    page = await page_scraper.async_method()

trio.run(main)

Synchronous:

import scrapewiki


wiki = scrapewiki.Scrapewiki()


with wiki.search("python", limit=45) as results:
    for search_result in results:
        ...

# equivalent of

searcher = wiki.search("python")
results = searcher.sync_method()
import scrapewiki


wiki = scrapewiki.Scrapewiki()


with wiki.wiki("python") as page:
    ...

# equivalent of

page_scraper = wiki.wiki("python")
page = page_scraper.sync_method()

Extras

The module also provides some utility functions for ease of use (currently just one):

Plans

There are a lot of things that needs to be parsed. There are a lot of bugs that needs to be fixed. I'm pretty sure there are some typos in docstrings and wrong annotations as well. My plan for now is to fix the aforesaid problems.

Note

This library is English only due to how some things have been parsed. I'm sure there are better ways to do them and make it support all languages. This is in my TODO list.

Documentation

I don't have any plans for online documentation as of now. Please read the source code. All the dataclasses can be found here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapewiki-0.1.5b0.tar.gz (10.0 kB view hashes)

Uploaded Source

Built Distribution

scrapewiki-0.1.5b0-py3-none-any.whl (12.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page