Package made for converting Wikipedia articles to Markdown format
Project description
Wikipyedia_md
A python package for converting Wikipedia articles to Markdown format.
Usage
Install the package
pip install wikipyedia-md
Fetch, parse and save a list of articles from urls as .md files in the output dirrectory
from wikipyedia_md import articles_to_markdown
urls = [
"https://en.wikipedia.org/wiki/Computer_science",
# ...
]
articles_to_markdown(urls, output_dir="./articles")
Or you can do it manually
import requests
from wikipyedia_md.html_filtering import filter_html
from wikipyedia_md.wiki_parser import parse_article
urls = [
"https://en.wikipedia.org/wiki/Computer_science",
# ...
]
for url in urls:
response = requests.get(url, timeout=10)
content = response.text
modified_html = filter_html(content)
article = parse_article(modified_html)
file_name = url.split("/")[-1] + ".md"
article.save_md(f"{output_dir}/{file_name}")
By default filter_html filters out common html elements that would mess with the markdown output, you can modify the elements if you want by passing a list of elements
from wikipyedia_md import IGNORE_ELEMENTS
custom_elements = IGNORE_ELEMENTS
custom_elements.extend([
"img",
# ...
])
modified_html = filter_html(content, filter_elements=custom_elements)
Contributing
Contributions to this package are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository.
References
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file wikipyedia_md-0.1.0.tar.gz
.
File metadata
- Download URL: wikipyedia_md-0.1.0.tar.gz
- Upload date:
- Size: 3.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.2 CPython/3.10.6 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84f4ce61c744471982053648c4d7b066f7722381cc1b8fac7ab8b834e8b269c1 |
|
MD5 | 9a1a9bbb060e08b3dca40955ec6b752f |
|
BLAKE2b-256 | f7dbc44c045354744034a2a3c88d045a24ac2656baa104bf5a754136b7cab8cd |
File details
Details for the file wikipyedia_md-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: wikipyedia_md-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.2 CPython/3.10.6 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3c939b0683bd84276e785db60bb2d245a763461bf3e308e4809402a0933d4460 |
|
MD5 | 98d15c528238869a351708dd934573b1 |
|
BLAKE2b-256 | 4c9f356a6a70ea55c1f5063b6b391a023086b120853d8c90414fc48c7844c083 |