Download Wikipedia data dumps
Project description
Wikipedia downloader
wikipedia_downloader is a Python module that makes it easy to download Wikipedia data dumps.
Installation
To install wikipedia_downloader, simply run:
pip install wikipedia_downloader
Documentation
Functions
-
wikipedia_downloader.download_sql_dump(language, file, dump="latest", target_dir=".")
Downloads and decompresses a Wikipedia SQL dump.
Arguments:
- language: Wikipedia name (language code).
- file: File name.
- dump: Dump version.
- target_dir: Target directory.
Example
import wikipedia_downloader as wpd wpd.download_sql_dump("en", "pagelinks", dump="20190101", target_dir="./dumps")
-
wikipedia_downloader.get_dataframe(language, file, dump="latest", select=None, where=None)
Builds a pandas.DataFrame from a Wikipedia SQL dump.
Arguments:
- language: Wikipedia name (language code).
- file: File name.
- dump: Dump version.
- select: Columns to be kept.
- where: Functions used to filter records.
Returns: pandas.DataFrame
Example
import wikipedia_downloader as wpd select = ["page_id", "page_namespace", "page_title"] where = {"page_namespace": lambda x: x == 0 or x == 14} df = wpd.get_dataframe("en", "page", dump="20190101", select=select, where=where)
License
This project is licensed under the MIT License - see the LICENSE.md file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for wikipedia_downloader-0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ed4b68aa983e631385367a9553d37bff02c04617fae2f244755ce7d0402b97be |
|
MD5 | 046189d1261137a4b8a6794edc771d39 |
|
BLAKE2b-256 | b337cbf8b1e1f85dec357cd61064bd8e71e96d58d0e8cf249f5e8de5d8a2660a |