Library for wikipedia dataset collection
Project description
Wiker
library for wikipedia text dataset collection
Installation
pip install wiker
Quickstart
from wiker import Wiker
wk = Wiker(lang='uz', first_article_link="Turkiston")
wk.run(scrape_limit=50)
Another methods
from wiker import Wiker
wk = Wiker(lang='uz', first_article_link="Turkiston")
wk.reader() # read the pre_urls.txt file and return the result as a list
wk.read_url_count() # The number of all links that read the pre_urls.txt file
wk.extra_file_writer() # if the pre_urls.txt file is empty, the function writes first_article_link to the file
wk.scraper() # Get all articles from links in pre_urls.txt file
wk.text_cleaner() # clean up the html and other tags in the retrieved articles
wk.next_urls() # get links for further scraping
wk.dir_scanner() # scan the "data" folder to count files
wk.cleaned_text_writer(text_dict=wk.text_cleaner()) #
wk.post_url_writer(url_list=wk.scraper().keys()) # writing the name of the saved articles to the file
wk.pre_url_writer(url_list=wk.next_urls()) # write names in next_urls to files for next process
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
wiker-0.0.1.tar.gz
(4.5 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
wiker-0.0.1-py3-none-any.whl
(4.9 kB
view details)
File details
Details for the file wiker-0.0.1.tar.gz.
File metadata
- Download URL: wiker-0.0.1.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97d9ae5df053d2bf8e3062a19c898535ce4908d12a4588f5bc8c54b96ac168aa
|
|
| MD5 |
45ee4f384f345977e2af411979c1593c
|
|
| BLAKE2b-256 |
787426c285aceac675e39aa5fbf0537b8848d926b91e774a6f6650f143107eb8
|
File details
Details for the file wiker-0.0.1-py3-none-any.whl.
File metadata
- Download URL: wiker-0.0.1-py3-none-any.whl
- Upload date:
- Size: 4.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd81dd51da506cc5b23cd14f42b2eab1c096bb679dc79402e6fa705cd9963110
|
|
| MD5 |
62ac933453871c46883af7a30539236f
|
|
| BLAKE2b-256 |
241c94cd5bb252b81a9e8a048fd850ab237cb757252014120624fea75e405e62
|