Skip to main content

Package made for converting Wikipedia articles to Markdown format

Project description

Wikipyedia_md

A python package for converting Wikipedia articles to Markdown format.

Usage

Install the package

pip install wikipyedia-md

Fetch, parse and save a list of articles from urls as .md files in the output dirrectory

from wikipyedia_md import articles_to_markdown

    urls = [
            "https://en.wikipedia.org/wiki/Computer_science",
            # ...
        ]
        articles_to_markdown(urls, output_dir="./articles")

Or you can do it manually

import requests
from wikipyedia_md.html_filtering import filter_html
from wikipyedia_md.wiki_parser import parse_article

urls = [
    "https://en.wikipedia.org/wiki/Computer_science",
    # ...
]

for url in urls:
    response = requests.get(url, timeout=10)
    content = response.text
    modified_html = filter_html(content)
    article = parse_article(modified_html)
    file_name = url.split("/")[-1] + ".md"
    article.save_md(f"{output_dir}/{file_name}")

By default filter_html filters out common html elements that would mess with the markdown output, you can modify the elements if you want by passing a list of elements

from wikipyedia_md import IGNORE_ELEMENTS

custom_elements = IGNORE_ELEMENTS
custom_elements.extend([
    "img",
    # ...
])
modified_html = filter_html(content, filter_elements=custom_elements)

Contributing

Contributions to this package are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikipyedia_md-0.1.0.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

wikipyedia_md-0.1.0-py3-none-any.whl (5.1 kB view details)

Uploaded Python 3

File details

Details for the file wikipyedia_md-0.1.0.tar.gz.

File metadata

  • Download URL: wikipyedia_md-0.1.0.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.6 Windows/10

File hashes

Hashes for wikipyedia_md-0.1.0.tar.gz
Algorithm Hash digest
SHA256 84f4ce61c744471982053648c4d7b066f7722381cc1b8fac7ab8b834e8b269c1
MD5 9a1a9bbb060e08b3dca40955ec6b752f
BLAKE2b-256 f7dbc44c045354744034a2a3c88d045a24ac2656baa104bf5a754136b7cab8cd

See more details on using hashes here.

File details

Details for the file wikipyedia_md-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: wikipyedia_md-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.6 Windows/10

File hashes

Hashes for wikipyedia_md-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3c939b0683bd84276e785db60bb2d245a763461bf3e308e4809402a0933d4460
MD5 98d15c528238869a351708dd934573b1
BLAKE2b-256 4c9f356a6a70ea55c1f5063b6b391a023086b120853d8c90414fc48c7844c083

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page