Skip to main content

Extract clean text from Wikipedia article HTML using Rust core parser

Project description

wikipedia-article-transform (Python)

Python bindings for the Rust wikipedia-article-transform library.

Install (from source)

pip install maturin
maturin develop --release

Library usage

from wikipedia_article_transform import fetch_article_html, extract

html = fetch_article_html("en", "Rust_(programming_language)")
text = extract(html, format="plain", language="en")
print(text)

CLI usage

wikipedia-article-transform fetch --language en --title "Rust_(programming_language)"
wikipedia-article-transform fetch --language ml --title "കേരളം" --format json
wikipedia-article-transform fetch --language en --title "Liquid_oxygen" --format markdown

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

wikipedia_article_transform-0.4.0-cp39-abi3-win_amd64.whl (223.1 kB view details)

Uploaded CPython 3.9+Windows x86-64

wikipedia_article_transform-0.4.0-cp39-abi3-manylinux_2_34_x86_64.whl (409.8 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.34+ x86-64

wikipedia_article_transform-0.4.0-cp39-abi3-macosx_11_0_arm64.whl (336.3 kB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file wikipedia_article_transform-0.4.0-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for wikipedia_article_transform-0.4.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 c8f7cf5e4c9afa028db58bae622c059d29ce69541457863ee6037f4d11ea485d
MD5 41a270cf8545545cf351cdb0db4651f5
BLAKE2b-256 77948423b0faf722d2b45da3426263303ade778f76fd1cb146e845683640de38

See more details on using hashes here.

Provenance

The following attestation bundles were made for wikipedia_article_transform-0.4.0-cp39-abi3-win_amd64.whl:

Publisher: publish-python.yml on santhoshtr/wikipedia-article-transform

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file wikipedia_article_transform-0.4.0-cp39-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for wikipedia_article_transform-0.4.0-cp39-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 fbaddc59fa92b2c2fd1831ea54256b7d3d406def038641668df95f01f2e6dfb5
MD5 74ac86f0a01b11a1187161faa94cac84
BLAKE2b-256 26611f076dbffa8b43faa96a25387729eee0f1463dfaf4bf671f69a9a27c1633

See more details on using hashes here.

Provenance

The following attestation bundles were made for wikipedia_article_transform-0.4.0-cp39-abi3-manylinux_2_34_x86_64.whl:

Publisher: publish-python.yml on santhoshtr/wikipedia-article-transform

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file wikipedia_article_transform-0.4.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for wikipedia_article_transform-0.4.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a79ab56259856788135a8073d703b484e2aa9ea8b543324bf46b6159d12ecb9a
MD5 857da9cd9d8abbdf548a343e117d6e7b
BLAKE2b-256 604e4a45a5186345e8d4432e3acc73ca291827917433261bd74fc6c045e9cb01

See more details on using hashes here.

Provenance

The following attestation bundles were made for wikipedia_article_transform-0.4.0-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: publish-python.yml on santhoshtr/wikipedia-article-transform

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page