Skip to main content

No project description provided

Project description

PyAutoCorpus

A python interface to the excellent AutoCorpus library.

Right now, it only supports the wiki markup textify function, which strips out markup. From my benchmarks, this ends up being ~40x faster than methods to strip markup using other libraries:

mwparserfromhell 0.208 sec/doc
wikitextparser 0.215 sec/doc
pyautocorpus 0.005 sec/doc

where:

  • mwparserfromhell is mwparserfromhell.parse(x).strip_code()
  • wikitextparser is wikitextparser.parse(x).plain_text()
  • pyautocorpus is pyautocorpus.Textifier().textify(x)

Installing

From pypi:

pip install pyautocorpus

From source:

Be sure to clone recursively:

git clone --recursive https://github.com/seanmacavaney/pyautocorpus.git

You will first need the pcre library installed.

python setup.py install

Usage

Example:

import pyautocorpus
textifier = pyautocorpus.Textifier()
textifier.textify("==Wiki Marked up text==\n [[Some Page|link text]] example.")
'Wiki Marked up text\n\n\n link text example.'

Known issues

  • Windows is not yet supported

Credits

AutoCorpus

Contributors to this repository:

  • Sean MacAvaney (University of Glasgow)
  • Thomas Jänich (University of Glasgow)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyautocorpus-0.1.15.tar.gz (10.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyautocorpus-0.1.15-pp39-pypy39_pp73-manylinux_2_28_x86_64.whl (206.7 kB view details)

Uploaded PyPymanylinux: glibc 2.28+ x86-64

pyautocorpus-0.1.15-pp38-pypy38_pp73-manylinux_2_28_x86_64.whl (206.7 kB view details)

Uploaded PyPymanylinux: glibc 2.28+ x86-64

pyautocorpus-0.1.15-pp37-pypy37_pp73-manylinux_2_28_x86_64.whl (207.2 kB view details)

Uploaded PyPymanylinux: glibc 2.28+ x86-64

pyautocorpus-0.1.15-cp314-cp314-win_amd64.whl (6.9 kB view details)

Uploaded CPython 3.14Windows x86-64

pyautocorpus-0.1.15-cp314-cp314-macosx_10_15_universal2.whl (45.2 kB view details)

Uploaded CPython 3.14macOS 10.15+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.15-cp313-cp313-win_amd64.whl (6.7 kB view details)

Uploaded CPython 3.13Windows x86-64

pyautocorpus-0.1.15-cp313-cp313-macosx_10_13_universal2.whl (45.0 kB view details)

Uploaded CPython 3.13macOS 10.13+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.15-cp312-cp312-win_amd64.whl (6.7 kB view details)

Uploaded CPython 3.12Windows x86-64

pyautocorpus-0.1.15-cp312-cp312-macosx_10_13_universal2.whl (45.0 kB view details)

Uploaded CPython 3.12macOS 10.13+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.15-cp311-cp311-win_amd64.whl (6.6 kB view details)

Uploaded CPython 3.11Windows x86-64

pyautocorpus-0.1.15-cp311-cp311-manylinux_2_28_x86_64.whl (378.1 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

pyautocorpus-0.1.15-cp311-cp311-macosx_10_9_universal2.whl (44.9 kB view details)

Uploaded CPython 3.11macOS 10.9+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.15-cp310-cp310-win_amd64.whl (6.6 kB view details)

Uploaded CPython 3.10Windows x86-64

pyautocorpus-0.1.15-cp310-cp310-manylinux_2_28_x86_64.whl (377.9 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

pyautocorpus-0.1.15-cp310-cp310-macosx_10_9_universal2.whl (44.9 kB view details)

Uploaded CPython 3.10macOS 10.9+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.15-cp39-cp39-win_amd64.whl (6.7 kB view details)

Uploaded CPython 3.9Windows x86-64

pyautocorpus-0.1.15-cp39-cp39-manylinux_2_28_x86_64.whl (377.8 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.28+ x86-64

pyautocorpus-0.1.15-cp39-cp39-macosx_10_9_universal2.whl (44.9 kB view details)

Uploaded CPython 3.9macOS 10.9+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.15-cp38-cp38-manylinux_2_28_x86_64.whl (377.9 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.28+ x86-64

pyautocorpus-0.1.15-cp37-cp37m-manylinux_2_28_x86_64.whl (377.4 kB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.28+ x86-64

File details

Details for the file pyautocorpus-0.1.15.tar.gz.

File metadata

  • Download URL: pyautocorpus-0.1.15.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.6

File hashes

Hashes for pyautocorpus-0.1.15.tar.gz
Algorithm Hash digest
SHA256 92a03b690b3c34237ed0c442532bb59a1681a8bd83d7bb829007369adb387de0
MD5 a606e5d5badf49147acb630d4b0d7224
BLAKE2b-256 149c84c8a1363126ecbb0d52991d6c4d557f9ae4b1875ea711a61d0c206b020c

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.15-pp39-pypy39_pp73-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.15-pp39-pypy39_pp73-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 38cc42cb010f41ee23cc4b7cc1836ee6d780e771b23db48d29170796e4160206
MD5 20c5e0894f34b48922aa7dacdb92a13f
BLAKE2b-256 359a4750866a6c3f843e7c73607e9ef10424c8d7474d429f57ee019756cd4784

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.15-pp38-pypy38_pp73-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.15-pp38-pypy38_pp73-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8dce0775c24eda7a182f23f84ce7bbc3321f6d3b79eea762f9ecb444758978a4
MD5 9313ed2dbc0b440081ed5d5ee00d43d8
BLAKE2b-256 2e262dbd93f2fbdf1974130c000ddaed72a2bfd48c72aa6344f02f026a698d7e

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.15-pp37-pypy37_pp73-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.15-pp37-pypy37_pp73-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 6f9fd3f801a2d1bf7ef04d318d297abc377d5d32f1f7596182b1bbc52579edcd
MD5 537a87a05ee06ec89cf8a6f93b854470
BLAKE2b-256 ab53f9c49c3f8b65165e5bec4a0d665c4bdb24f7e900e85601f22dd04e1fd39f

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.15-cp314-cp314-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.15-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 06ec4caadcdc908436e96dfe2a28410408b9daadf884f72b137d0b94b7e062a2
MD5 28e758af9bd99299fe4cdb154cc0d8a2
BLAKE2b-256 59e6a4650b21de3a42902f8f7d35107cb0ca28bc0ed4d3b283531972e555e5e7

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.15-cp314-cp314-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.15-cp314-cp314-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 b1118bbe978ae1d98f991c414952bdbc610338c7dff507d435a546596e8e9407
MD5 e09b954cc022af0ec3fb678add01b021
BLAKE2b-256 e2497eab26dd5b75c0b9b1474cf159e5dd64e42ebb1c61bdf035ddc384bb4acb

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.15-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.15-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 64f560cd60dc09f9dd079325b64f01558d826bbd741d7cce7d84003247cdde06
MD5 1a3e2266f38c78c44258b033f1164d4e
BLAKE2b-256 472331b90ca804b5632da78e5247d8a2990910356dd84b4399d934394aad3640

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.15-cp313-cp313-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.15-cp313-cp313-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 552a11700057df40880b352ef73dc618aeedc852c028d8223b5cff5a2407d142
MD5 6a7e01ce1dd3f2256e7edd327a0894f5
BLAKE2b-256 0f8c457a5d964ce191868dbef4e455dccec99f23f539e55440638ffd681d2587

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.15-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.15-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 c5020a6460f459029a2066fceab97d8873b6c7d5a00054514e4bfa3f0f0525ac
MD5 c0fb727f6e5e118262cfbc76143f9135
BLAKE2b-256 90a8693a5cdb787f88399c40f1a5efa85e592b57a3696a191ed338a68aafb3a4

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.15-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.15-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 3af9913b5a17a4af4ea1811425908d3f50b584e71bfaf9f2ccb2703cb5c27208
MD5 deaf4bb3a52b80c8da367a315b7d2c3f
BLAKE2b-256 3cc73871b87a1ec6242d643c417ac5bd830ea3c4c1dbcff1a97b2f03e63bf121

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.15-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.15-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 561cc92252d7d35a27970d0fcac5ad19eaeb07e68616f8efdacaf7d43130d68e
MD5 435bb5c5b1bbeeb6b38d89a4a71cc4a4
BLAKE2b-256 5a0b4dd646173bfea8e47e5592b417532f87656634fcc16036f2c693b33853ef

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.15-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.15-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 dd323cc134c5ee45436cb7f1d6f1bc454bafc1eb6186cf3d14944515719221f1
MD5 c10ae7fa401b4a34595cf99b623c5cf9
BLAKE2b-256 8de2eaf796a0f1568007cb707afaaf0df45f474ad1f9ee9dd4abb7126c458812

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.15-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.15-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 6536a8d801be1d70c05623cef91faf0fc61c0d3b1d9590cae205c84a106e7f30
MD5 f886f108cd199b1b673c7bb1c2ad978a
BLAKE2b-256 e0f1bff0ac9aaf7e4cd45be6c7188620e1e4160cd7e41e81bfc7171c2c74b5c2

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.15-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.15-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 23360367f2e78c19bc08fd9f241b9c80873af23bd4ddf59b7b50cdf988ec08f4
MD5 00dbf3f71803ba523c402ad52d90b138
BLAKE2b-256 af5c07ed88ccbcaa3f2f51c00e8fa0ad2c5031756d01d17fcf710f8fd57da138

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.15-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.15-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 28af86848ca9d14abf7283566afe0bd1c75e87012ea507f9b740a8f7d2a997ef
MD5 2ac1ea53a5b4b407fa3a1d65173302a9
BLAKE2b-256 5a6df5b0a77c69b4577d0e3be1dde363f433a6af10ac0d087983997e4759970d

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.15-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.15-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 c532ee8bf4d909a067ef98b1c597ace0f48eb39721a6ee1e84a413cf5f49e7a0
MD5 28c329517be124f85f0e2d8ff65f30a2
BLAKE2b-256 974fd01a2bd910862556ad0b58780300c575e4462a52ae55acc5bd5fc1c3d0e3

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.15-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.15-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 6c9f03eeb2016b1e8904e2f9a7c2873288c01967e5036e076d5dd052476de9cd
MD5 5559e86c891c352d09e9ff61f9f5bcbd
BLAKE2b-256 8d11af99635b5ed716572b23fa752e7feb3b3f5cce5231e71f2300d9119e8e69

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.15-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.15-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1a8a854f1b5b9f512e1896c8f508d3b1f9ecc55dee2764e86a075dbbaee08953
MD5 da91e7ca59f7e871bcbcf8f12083fca7
BLAKE2b-256 2f09d26b1f6ee43b4a2d6a0d8f34ba33e12cad466ea3d89242d4780f2b0f60b9

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.15-cp39-cp39-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.15-cp39-cp39-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 948974a93b2a001fa153ff1c30ae40985cefc081a1be90ba21e2254b00b7d6ae
MD5 b785881dae48a0ede125469d99b2447e
BLAKE2b-256 f29b8fee31eefdba633e1c8f90acbc8daa708bf5d586357452ffff37ca14a507

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.15-cp38-cp38-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.15-cp38-cp38-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1bda57fc8d8fe9fce5dc6018609213f75fbd2303df3b626e6d0f778e0946215e
MD5 c4fafcf5270098cae9fab06c1384384f
BLAKE2b-256 046027dc9dcc5352052812caa47eb3b4bcdd6b0e67511feab381d65461c6b022

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.15-cp37-cp37m-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.15-cp37-cp37m-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 981970da298de4a9b70dafe5e7be0f8cccf2394cf90dd3f56d8cd6d4b91c7d66
MD5 544895bedc3394f70c85736d1aa33e18
BLAKE2b-256 93704c0e8287fc62955c46a1e84650ee4f2eb350ba8fac0cc487af249cb76c84

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page