Skip to main content

Extract clean text from Wikipedia article HTML using Rust core parser

Project description

wikipedia-article-transform (Python)

Python bindings for the Rust wikipedia-article-transform library.

Install (from source)

pip install maturin
maturin develop --release

Library usage

from wikipedia_article_transform import fetch_article_html, extract

html = fetch_article_html("en", "Rust_(programming_language)")
text = extract(html, format="plain", language="en")
print(text)

CLI usage

wikipedia-article-transform fetch --language en --title "Rust_(programming_language)"
wikipedia-article-transform fetch --language ml --title "കേരളം" --format json
wikipedia-article-transform fetch --language en --title "Liquid_oxygen" --format markdown

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

wikipedia_article_transform-0.3.0-cp39-abi3-win_amd64.whl (223.1 kB view details)

Uploaded CPython 3.9+Windows x86-64

wikipedia_article_transform-0.3.0-cp39-abi3-manylinux_2_34_x86_64.whl (409.8 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.34+ x86-64

wikipedia_article_transform-0.3.0-cp39-abi3-macosx_11_0_arm64.whl (336.2 kB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file wikipedia_article_transform-0.3.0-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for wikipedia_article_transform-0.3.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 a7da4c8ae1019c4c72347e01a89d6cea84e17697ee41dcc455e592864eca1db1
MD5 bdcd50a611a0f08303b66d64cbda5ad4
BLAKE2b-256 d9a0052be00f264a86eb1ad15427ebb98ea22bf834ae4abe034a72d598f80d99

See more details on using hashes here.

Provenance

The following attestation bundles were made for wikipedia_article_transform-0.3.0-cp39-abi3-win_amd64.whl:

Publisher: publish-python.yml on santhoshtr/wikipedia-article-transform

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file wikipedia_article_transform-0.3.0-cp39-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for wikipedia_article_transform-0.3.0-cp39-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 1170208cc9ac6d655e6785bd4d55178c0c0e6a5cbdada3ba8fa347262ce6e981
MD5 5f82b96479c1052ee9d7f0e8833e48c6
BLAKE2b-256 f240e402c3dbd1c79b1db086654b3b98e223a5963356a5219737b89dd88b9b4e

See more details on using hashes here.

Provenance

The following attestation bundles were made for wikipedia_article_transform-0.3.0-cp39-abi3-manylinux_2_34_x86_64.whl:

Publisher: publish-python.yml on santhoshtr/wikipedia-article-transform

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file wikipedia_article_transform-0.3.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for wikipedia_article_transform-0.3.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 760fc08449a6596a166b2a131fcb6ee771707c549488662ee1a5dbc76ac11567
MD5 fc30d0767e932a0815554dc5e414d604
BLAKE2b-256 0e80ddb2b0a0180f509d7f5c30e9d5884a14a50c280d9603dbdf89b513f837cc

See more details on using hashes here.

Provenance

The following attestation bundles were made for wikipedia_article_transform-0.3.0-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: publish-python.yml on santhoshtr/wikipedia-article-transform

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page