Skip to main content

A simple asynchronous and synchronous Wikipedia scraper.

Project description

Scrapewiki - Wikipedia Scraper

It can scrape Wikipedia synchronously and asynchronously. scrapewiki.Scrapewiki has two methods, search and wiki.

wiki

It is used to scrape a Wikipedia page.

search

It is used to search some query on Wikipedia. limit parameter can be optionally specified to set a limit to the amount of results.

Examples

Asynchronous:

import scrapewiki
import trio


wiki = scrapewiki.Scrapewiki()


async def main():
    async with wiki.search("python") as results:
        async for search_result in results:
            ...

    # equivalent of

    searcher = wiki.search("python")
    results = await searcher.async_method()

trio.run(main)
import scrapewiki
import trio


wiki = scrapewiki.Scrapewiki()


async def main():
    async with wiki.wiki("python", limit=45) as page:
        ...

    # equivalent of

    page_scraper = wiki.wiki("python")
    page = await page_scraper.async_method()

trio.run(main)

Synchronous:

import scrapewiki


wiki = scrapewiki.Scrapewiki()


with wiki.search("python", limit=45) as results:
    for search_result in results:
        ...

# equivalent of

searcher = wiki.search("python")
results = searcher.sync_method()
import scrapewiki


wiki = scrapewiki.Scrapewiki()


with wiki.wiki("python") as page:
    ...

# equivalent of

page_scraper = wiki.wiki("python")
page = page_scraper.sync_method()

Extras

The module also provides some utility functions for ease of use (currently just one):

Plans

There are a lot of things that needs to be parsed. There are a lot of bugs that needs to be fixed. I'm pretty sure there are some typos in docstrings and wrong annotations as well. My plan for now is to fix the aforesaid problems.

Note

This library is English only due to how some things have been parsed. I'm sure there are better ways to do them and make it support all languages. This is in my TODO list.

Documentation

I don't have any plans for online documentation as of now. Please read the source code. All the dataclasses can be found here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapewiki-0.1.5b0.tar.gz (10.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapewiki-0.1.5b0-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file scrapewiki-0.1.5b0.tar.gz.

File metadata

  • Download URL: scrapewiki-0.1.5b0.tar.gz
  • Upload date:
  • Size: 10.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.1

File hashes

Hashes for scrapewiki-0.1.5b0.tar.gz
Algorithm Hash digest
SHA256 cfb5a512ccee89b73c7582af841346edd58bc467e4c0b128b9114d569f1fe9aa
MD5 60b0c90ea96c709fd308cbd745461363
BLAKE2b-256 12279c7b8218cdb5aad5e0e0769fff8aa1c47716d51b56213fdb5b99f16057a6

See more details on using hashes here.

File details

Details for the file scrapewiki-0.1.5b0-py3-none-any.whl.

File metadata

  • Download URL: scrapewiki-0.1.5b0-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.1

File hashes

Hashes for scrapewiki-0.1.5b0-py3-none-any.whl
Algorithm Hash digest
SHA256 ac43764ae644cf1e05ce38092c353c5e1330a466018b698781d1780de4926507
MD5 92af898bf95f2512714ac3a5b5f2bfe9
BLAKE2b-256 0bcd670b5309109fc7447339704d1d6294befc71bd4582b8e6b43097c01f82fb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page