Skip to main content

Download Wikipedia data dumps

Project description

Wikipedia downloader

wikipedia_downloader is a Python module that makes it easy to download Wikipedia data dumps.

Installation

To install wikipedia_downloader, simply run:

pip install wikipedia_downloader

Documentation

Functions

  • wikipedia_downloader.download_sql_dump(language, file, dump="latest", target_dir=".")

    Downloads and decompresses a Wikipedia SQL dump.

    Arguments:

    • language: Wikipedia name (language code).
    • file: File name.
    • dump: Dump version.
    • target_dir: Target directory.

    Example

    import wikipedia_downloader as wpd
    wpd.download_sql_dump("en", "pagelinks", dump="20190101", target_dir="./dumps")
    
  • wikipedia_downloader.get_dataframe(language, file, dump="latest", select=None, where=None)

    Builds a pandas.DataFrame from a Wikipedia SQL dump.

    Arguments:

    • language: Wikipedia name (language code).
    • file: File name.
    • dump: Dump version.
    • select: Columns to be kept.
    • where: Functions used to filter records.

    Returns: pandas.DataFrame

    Example

    import wikipedia_downloader as wpd
    select = ["page_id", "page_namespace", "page_title"]
    where = {"page_namespace": lambda x: x == 0 or x == 14}
    df = wpd.get_dataframe("en", "page", dump="20190101", select=select, where=where)
    

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikipedia_downloader-0.2.tar.gz (2.9 kB view details)

Uploaded Source

Built Distribution

wikipedia_downloader-0.2-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file wikipedia_downloader-0.2.tar.gz.

File metadata

  • Download URL: wikipedia_downloader-0.2.tar.gz
  • Upload date:
  • Size: 2.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.6

File hashes

Hashes for wikipedia_downloader-0.2.tar.gz
Algorithm Hash digest
SHA256 71d881e37337bb4b89dc3efd7740064ad9bbabacd8775af7c3a388fa75b7185d
MD5 2230bbfb607d1befcf46ef6d67917bbe
BLAKE2b-256 c2a63dfe3d1a1715373f6a7f8171b51820440489e6ddea694ae44b0eee0abd10

See more details on using hashes here.

File details

Details for the file wikipedia_downloader-0.2-py3-none-any.whl.

File metadata

  • Download URL: wikipedia_downloader-0.2-py3-none-any.whl
  • Upload date:
  • Size: 4.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.6

File hashes

Hashes for wikipedia_downloader-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ed4b68aa983e631385367a9553d37bff02c04617fae2f244755ce7d0402b97be
MD5 046189d1261137a4b8a6794edc771d39
BLAKE2b-256 b337cbf8b1e1f85dec357cd61064bd8e71e96d58d0e8cf249f5e8de5d8a2660a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page