Skip to main content

Extract clean text from Wikipedia article HTML using Rust core parser

Project description

wikipedia-article-transform (Python)

Python bindings for the Rust wikipedia-article-transform library.

Install (from source)

pip install maturin
maturin develop --release

Library usage

from wikipedia_article_transform import fetch_article_html, extract

html = fetch_article_html("en", "Rust_(programming_language)")
text = extract(html, format="plain", language="en")
print(text)

CLI usage

wikipedia-article-transform fetch --language en --title "Rust_(programming_language)"
wikipedia-article-transform fetch --language ml --title "കേരളം" --format json
wikipedia-article-transform fetch --language en --title "Liquid_oxygen" --format markdown

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

wikipedia_article_transform-0.2.0-cp39-abi3-win_amd64.whl (222.0 kB view details)

Uploaded CPython 3.9+Windows x86-64

wikipedia_article_transform-0.2.0-cp39-abi3-manylinux_2_34_x86_64.whl (408.3 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.34+ x86-64

wikipedia_article_transform-0.2.0-cp39-abi3-macosx_11_0_arm64.whl (335.2 kB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file wikipedia_article_transform-0.2.0-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for wikipedia_article_transform-0.2.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 4ea0d950533f3f47f583f1597594bf7a0ed013d1de80b16c34dbf39f55dd6b7d
MD5 6b06ebfaade6d35f8d9cb9d4625d2f26
BLAKE2b-256 b8aed18f467230148dc3e2976e7fcfec45146ef30be903f5c75169531ef76f66

See more details on using hashes here.

Provenance

The following attestation bundles were made for wikipedia_article_transform-0.2.0-cp39-abi3-win_amd64.whl:

Publisher: publish-python.yml on santhoshtr/wikipedia-article-transform

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file wikipedia_article_transform-0.2.0-cp39-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for wikipedia_article_transform-0.2.0-cp39-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 65465750b719ac6bcf54c6949894a4212593a7ed65b8ab8fa4d9d741bb17a560
MD5 896d0b4111dd1d0a199f34a9ac8dbb81
BLAKE2b-256 1c46b382a606a962b5232f2c4b7a59c2e31b4d81cb2a0ef3d8d6a923d395e632

See more details on using hashes here.

Provenance

The following attestation bundles were made for wikipedia_article_transform-0.2.0-cp39-abi3-manylinux_2_34_x86_64.whl:

Publisher: publish-python.yml on santhoshtr/wikipedia-article-transform

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file wikipedia_article_transform-0.2.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for wikipedia_article_transform-0.2.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b976c8d12f581e4c117ef2513ec54c20908e436db7268d21250a9215f5b350f0
MD5 9181dbfa007039199b0badd8dd2c742b
BLAKE2b-256 367da22526b3e5aed099dd8e3fcdc6a96abc0c0b744740e5452831d1c5599236

See more details on using hashes here.

Provenance

The following attestation bundles were made for wikipedia_article_transform-0.2.0-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: publish-python.yml on santhoshtr/wikipedia-article-transform

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page