Skip to main content

Library for wikipedia dataset collection

Project description

Wiker

library for wikipedia text dataset collection

Installation

pip install wiker

Quickstart

from wiker import Wiker

wk = Wiker(lang='uz', first_article_link="Turkiston")

wk.run(scrape_limit=50)

Another methods

from wiker import Wiker

wk = Wiker(lang='uz', first_article_link="Turkiston")

wk.reader() # read the pre_urls.txt file and return the result as a list
wk.read_url_count() # The number of all links that read the pre_urls.txt file
wk.extra_file_writer() # if the pre_urls.txt file is empty, the function writes first_article_link to the file
wk.scraper() # Get all articles from links in pre_urls.txt file
wk.text_cleaner() # clean up the html and other tags in the retrieved articles
wk.next_urls() # get links for further scraping
wk.dir_scanner() # scan the "data" folder to count files
wk.cleaned_text_writer(text_dict=wk.text_cleaner()) # 
wk.post_url_writer(url_list=wk.scraper().keys()) # writing the name of the saved articles to the file
wk.pre_url_writer(url_list=wk.next_urls()) # write names in next_urls to files for next process

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wiker-0.0.1.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

wiker-0.0.1-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file wiker-0.0.1.tar.gz.

File metadata

  • Download URL: wiker-0.0.1.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.7

File hashes

Hashes for wiker-0.0.1.tar.gz
Algorithm Hash digest
SHA256 97d9ae5df053d2bf8e3062a19c898535ce4908d12a4588f5bc8c54b96ac168aa
MD5 45ee4f384f345977e2af411979c1593c
BLAKE2b-256 787426c285aceac675e39aa5fbf0537b8848d926b91e774a6f6650f143107eb8

See more details on using hashes here.

File details

Details for the file wiker-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: wiker-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.7

File hashes

Hashes for wiker-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fd81dd51da506cc5b23cd14f42b2eab1c096bb679dc79402e6fa705cd9963110
MD5 62ac933453871c46883af7a30539236f
BLAKE2b-256 241c94cd5bb252b81a9e8a048fd850ab237cb757252014120624fea75e405e62

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page