Skip to main content

No project description provided

Project description

PyAutoCorpus

A python interface to the excellent AutoCorpus library.

Right now, it only supports the wiki markup textify function, which strips out markup. From my benchmarks, this ends up being ~40x faster than methods to strip markup using other libraries:

mwparserfromhell 0.208 sec/doc
wikitextparser 0.215 sec/doc
pyautocorpus 0.005 sec/doc

where:

  • mwparserfromhell is mwparserfromhell.parse(x).strip_code()
  • wikitextparser is wikitextparser.parse(x).plain_text()
  • pyautocorpus is pyautocorpus.Textifier().textify(x)

Installing

From pypi:

pip install pyautocorpus

From source:

Be sure to clone recursively:

git clone --recursive https://github.com/seanmacavaney/pyautocorpus.git

You will first need the pcre library installed.

python setup.py install

Usage

Example:

import pyautocorpus
textifier = pyautocorpus.Textifier()
textifier.textify("==Wiki Marked up text==\n [[Some Page|link text]] example.")
'Wiki Marked up text\n\n\n link text example.'

Known issues

  • Windows is not yet supported

Credits

AutoCorpus

Contributors to this repository:

  • Sean MacAvaney (University of Glasgow)
  • Thomas Jänich (University of Glasgow)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyautocorpus-0.1.16.tar.gz (10.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyautocorpus-0.1.16-cp314-cp314-win_amd64.whl (6.9 kB view details)

Uploaded CPython 3.14Windows x86-64

pyautocorpus-0.1.16-cp314-cp314-macosx_10_15_universal2.whl (45.2 kB view details)

Uploaded CPython 3.14macOS 10.15+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.16-cp313-cp313-win_amd64.whl (6.7 kB view details)

Uploaded CPython 3.13Windows x86-64

pyautocorpus-0.1.16-cp313-cp313-macosx_10_13_universal2.whl (45.0 kB view details)

Uploaded CPython 3.13macOS 10.13+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.16-cp312-cp312-win_amd64.whl (6.7 kB view details)

Uploaded CPython 3.12Windows x86-64

pyautocorpus-0.1.16-cp312-cp312-macosx_10_13_universal2.whl (45.0 kB view details)

Uploaded CPython 3.12macOS 10.13+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.16-cp311-cp311-win_amd64.whl (6.6 kB view details)

Uploaded CPython 3.11Windows x86-64

pyautocorpus-0.1.16-cp311-cp311-macosx_10_9_universal2.whl (44.9 kB view details)

Uploaded CPython 3.11macOS 10.9+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.16-cp310-cp310-win_amd64.whl (6.6 kB view details)

Uploaded CPython 3.10Windows x86-64

pyautocorpus-0.1.16-cp310-cp310-macosx_10_9_universal2.whl (44.9 kB view details)

Uploaded CPython 3.10macOS 10.9+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.16-cp39-cp39-win_amd64.whl (6.7 kB view details)

Uploaded CPython 3.9Windows x86-64

pyautocorpus-0.1.16-cp39-cp39-macosx_10_9_universal2.whl (44.9 kB view details)

Uploaded CPython 3.9macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file pyautocorpus-0.1.16.tar.gz.

File metadata

  • Download URL: pyautocorpus-0.1.16.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for pyautocorpus-0.1.16.tar.gz
Algorithm Hash digest
SHA256 b3cd34070d3238be7482f97632ac614e988086de9b6354d6376ebdf12145d151
MD5 13b0b7b680bac0c8ba3ce44ea7207d14
BLAKE2b-256 391cf18aff55b554e865eb084cdebc126b4edc2a517e08e2b38d62621ed4ffb4

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.16-cp314-cp314-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.16-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 516d972a707169d7f9247d352f3786e6113cdc84f0a9be6768708cfd0df56fa8
MD5 dec4af42a0203a0003e05c004f61aa53
BLAKE2b-256 db5dd114bbada61e33b21853598ef601eb0c57ff4548f212fd7f70fade5d5345

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.16-cp314-cp314-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.16-cp314-cp314-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 49265e960a5eae634c07d220c203f26a10443b8184829aa0e54f3d9df4974bb3
MD5 7f26b3c3690d499a126269f7e62d9ef5
BLAKE2b-256 ce020599f7135b5c2c07dfb9d418bab40bbf91d3633b47eb7fa76ea4fcd61070

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.16-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.16-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 9f330330d1b51010ba99b33b0b79799a275f577d03b6b15e377d1e8eccbfae9d
MD5 db2bbbb849f60bf404678ed6303f55a8
BLAKE2b-256 03fe3ae77c397d2ce40be35a1b1e81f35367868fae3e556d91bc41a51eb50d2f

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.16-cp313-cp313-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.16-cp313-cp313-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 3d26306bad2e54eedf6c26ebe1a324d77f95abb543068d3eaa223abd23a8e683
MD5 b36bbf92d491953ced84d840a68e1eea
BLAKE2b-256 df92cc80138f2bce88b7aa752ab874c65955ad27b97364ddc3d3dd645f6e5dbe

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.16-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.16-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 81dc2e3dfba9bca5a2853528f735d019d2d9a1ea431368aa8ef8be87ad4dcb25
MD5 cadf4b29ca4d4128207f8971a99a962e
BLAKE2b-256 b31d9e37b77070a79cd55194633b0925d91cc16356d99482246e83ddb022f325

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.16-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.16-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 3ce7d36f17e64d42a7d9d0a2a9132dec4bd24765e1f9fbe40166521bcaf872f2
MD5 58d8bec2808457933bd01e54efde29cf
BLAKE2b-256 e32dd18dc2cc28f75888788cedc2940476f001592ae82d089909ced8aec48e92

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.16-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.16-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 b830aeea08ebc6091148474444bedd7ba588c9b38fd2c0d11be4a075655f9914
MD5 95251b539cf6186b78cfb6fd00154de8
BLAKE2b-256 7cc4ced59695c7a5225ae44b89bb9cf2e584bc280a2f5260a6eb47969ba89a25

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.16-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.16-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 53e9f2184596c6e5a6505cdf7c8b1484c294257844fb2fc4474dffc85c5974ac
MD5 64069fea37b1434a15d321488e4e421f
BLAKE2b-256 69d54586fa6593ad5af1e302093a6917fd25d15ed9698ed2c9627ace1e578ba4

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.16-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.16-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 f429e813c7129a5f0e065ac89a0e5ea880085d929679e68e2a75c9c05f864cd8
MD5 a25eef2d7ef6b7d693ddf156545b547d
BLAKE2b-256 330440642a8d329d3ec89d2258c496fd11d61fbd21f5647a41ad1b8000ec7aa0

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.16-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.16-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 08d6700a03122fe393f92071162c7a6e8e3d8414dd68e3d9f00a1da97507534c
MD5 115be681295ce6a7d894393275b3a58a
BLAKE2b-256 db0b66e5d995ec4bb4bdc80db67dc69c0d917ad4d756e3814290a19788acc64a

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.16-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.16-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 c884c98463efd86351b314fe37d0ada8d3519fdf50e721c1fda0ee8354aad8a4
MD5 3e61bbafcd470a1173c1ea13a2defc39
BLAKE2b-256 20a07ca29e8683b45da97ada9c92e52ea08d39eb9d160acdff49c113c5c2c62c

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.16-cp39-cp39-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.16-cp39-cp39-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 f4b2ebaab8a0f992a35366089b2ed889d3e3ea50a9bef362df47a2c471274031
MD5 b5a10c3b007bfc6b7867e14f5606b7f7
BLAKE2b-256 d9840dd46f877a908854ab9186a918a710b9b74538025451fdd57271249a79fd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page