Skip to main content

A simple Wikipedia parser

Project description

SimpleWikiParser

An Simplified Wiki Data Parser

Installation

pip install git+https://github.com/Biswajit2902/SimpleWikiParser.git

Usage:

from wikiparser.core import WikiMediaDumpParser

# initialise Parser for a language (say Hindi)
wiki_dump_parser = WikiMediaDumpParser(language="Hindi")

# parse
wiki_dump_parser.parse()

# export
wiki_dump_parser.export_hf_dataset("/path/to/data.jsonl", "dataset_name")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple-wikiparser-0.0.0.tar.gz (7.5 kB view details)

Uploaded Source

File details

Details for the file simple-wikiparser-0.0.0.tar.gz.

File metadata

  • Download URL: simple-wikiparser-0.0.0.tar.gz
  • Upload date:
  • Size: 7.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for simple-wikiparser-0.0.0.tar.gz
Algorithm Hash digest
SHA256 df7183e510757cd1a8e1259b2522f01e6638c818274e3249879dade077993f9d
MD5 bc5276345d5826cc6aaa5fab5df5d023
BLAKE2b-256 1e3378e7b5dbc0793899ab1903d7377c210690fe2cdf1c6fe98bbd3e91c2c142

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page