Skip to main content

No project description provided

Project description

PyAutoCorpus

A python interface to the excellent AutoCorpus library.

Right now, it only supports the wiki markup textify function, which strips out markup. From my benchmarks, this ends up being ~40x faster than methods to strip markup using other libraries:

mwparserfromhell 0.208 sec/doc
wikitextparser 0.215 sec/doc
pyautocorpus 0.005 sec/doc

where:

  • mwparserfromhell is mwparserfromhell.parse(x).strip_code()
  • wikitextparser is wikitextparser.parse(x).plain_text()
  • pyautocorpus is pyautocorpus.Textifier().textify(x)

Installing

From pypi:

pip install pyautocorpus

From source:

Be sure to clone recursively:

git clone --recursive https://github.com/seanmacavaney/pyautocorpus.git

You will first need the pcre library installed.

python setup.py install

Usage

Example:

import pyautocorpus
textifier = pyautocorpus.Textifier()
textifier.textify("==Wiki Marked up text==\n [[Some Page|link text]] example.")
'Wiki Marked up text\n\n\n link text example.'

Known issues

  • Windows is not yet supported

Credits

AutoCorpus

Contributors to this repository:

  • Sean MacAvaney (University of Glasgow)
  • Thomas Jänich (University of Glasgow)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyautocorpus-0.1.14.tar.gz (10.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyautocorpus-0.1.14-cp314-cp314-win_amd64.whl (6.9 kB view details)

Uploaded CPython 3.14Windows x86-64

pyautocorpus-0.1.14-cp314-cp314-macosx_10_15_universal2.whl (45.2 kB view details)

Uploaded CPython 3.14macOS 10.15+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.14-cp313-cp313-win_amd64.whl (6.7 kB view details)

Uploaded CPython 3.13Windows x86-64

pyautocorpus-0.1.14-cp313-cp313-macosx_10_13_universal2.whl (45.0 kB view details)

Uploaded CPython 3.13macOS 10.13+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.14-cp312-cp312-win_amd64.whl (6.7 kB view details)

Uploaded CPython 3.12Windows x86-64

pyautocorpus-0.1.14-cp312-cp312-macosx_10_13_universal2.whl (45.0 kB view details)

Uploaded CPython 3.12macOS 10.13+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.14-cp311-cp311-win_amd64.whl (6.6 kB view details)

Uploaded CPython 3.11Windows x86-64

pyautocorpus-0.1.14-cp311-cp311-macosx_10_9_universal2.whl (44.9 kB view details)

Uploaded CPython 3.11macOS 10.9+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.14-cp310-cp310-win_amd64.whl (6.6 kB view details)

Uploaded CPython 3.10Windows x86-64

pyautocorpus-0.1.14-cp310-cp310-macosx_10_9_universal2.whl (44.9 kB view details)

Uploaded CPython 3.10macOS 10.9+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.14-cp39-cp39-win_amd64.whl (6.7 kB view details)

Uploaded CPython 3.9Windows x86-64

pyautocorpus-0.1.14-cp39-cp39-macosx_10_9_universal2.whl (44.9 kB view details)

Uploaded CPython 3.9macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file pyautocorpus-0.1.14.tar.gz.

File metadata

  • Download URL: pyautocorpus-0.1.14.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for pyautocorpus-0.1.14.tar.gz
Algorithm Hash digest
SHA256 3c4ca714026cf7acca30eabeb2419769ecedafade3aaed3818e8ba8f71958cf5
MD5 4a6024ddf96046808f072e5ee922c363
BLAKE2b-256 975acb52eeedf627d04639040a790d033b45b330c473217f363c126761a8291c

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.14-cp314-cp314-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.14-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 10dfe5b1e5829493a147f5654b22beb3d973dac7dcae7d3e8c515ceccb77e70a
MD5 8c55eeec04b016e638ef6d2b8d7aedf4
BLAKE2b-256 ed4fe9820231423c3373885559710a0d4a50a55ec7f39bf0dc7007f712631dad

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.14-cp314-cp314-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.14-cp314-cp314-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 59df356d50f84b9098f3eb84fce156c44f09b30aefe8670aea8506e53c9fa5fa
MD5 6e9a273d19645a306c71aff7b283d41a
BLAKE2b-256 13653d4b44718fe3f4d5c24846808779ce9828a9f68c40f44c8228759e7a9731

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.14-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.14-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 de386d6c99c73885b3421d3e2ac2996d7124b6db5398dd141991a0e4fee26779
MD5 296bad2b75b281bc77e796de5efe0204
BLAKE2b-256 308ad831c7e417e114ddd3219716a05dd7df052adb9bae947e3429a03d79474a

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.14-cp313-cp313-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.14-cp313-cp313-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 265af2b22be6c9864ea786b1f36f5398929470ad901ba5ad75138336976f5c33
MD5 397218106ea733c06719bbd837519cec
BLAKE2b-256 467572f701a9d54c077ee4c3847746eb55e8d70804947d477082435e89e2179e

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.14-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.14-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 ca4c5b64d7647aca6f59a1471be0c0a49a3e6ba848bd452cc9304e00cfe1ba7e
MD5 a16fe4c8770754e1491f428c41b8aa41
BLAKE2b-256 e065988086bffd263ae34a405191a4a175ff9e81c07dbeee2cb7b1ab2d511b35

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.14-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.14-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 3e25ca51571e6638a27b4e131d943c80b4dff16f71e23f011f617f9a3a4a120e
MD5 2caff71f9c5bac6b45d17d7c831089c5
BLAKE2b-256 8501ea29267275986517a3fa4f5e66f0315bc39f02f6872018f0514b4c7a6192

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.14-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.14-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 9e7870489561418f2cac513d31d3c1945dc554cd9bfe29ac9c38c10fe819df3e
MD5 e86d182bc1b0e9c3a34af7eb0babef6c
BLAKE2b-256 5b5ddfcd2f555d629dddc482372380cc40ed4d490b98d2fcd844b65d61de04db

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.14-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.14-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 ba70ef75fa06a92cbd02b62386c7344b8c9ff856835a9066eb6aada1b4dee94f
MD5 880a510c235d139d897f160cd8838480
BLAKE2b-256 41a2009bdcf1e2396e68b1e42c7ca02e3c18f871ac772ea5dde0d287f2089ce7

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.14-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.14-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 e94bc70963d5ead8d1cc25474e61958104d87dd794fed330cf0f5709e8105ed8
MD5 661f9c8986530f51be521a008a99c181
BLAKE2b-256 6e3569abe4efb6d8501204a9eefef9063754ea1e5944c78dc50c6aa6a0d2448e

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.14-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.14-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 ed7c7c8483e2e2d4c0bc28648035ed56243256cd45640b6601b50103a7677310
MD5 b42908f37fc1e77a6158e4ec5ececc53
BLAKE2b-256 8ee1be77a3ecf853a27e5271ccc1b0b74e758ed2ebd58023197b9116efa8d295

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.14-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.14-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 eab6f3b742a7bcb0be4e4cf6b16aedfae57f9a316e0a05061a6ce3173b14c857
MD5 f67a7f2537039cc6a03fcab999520332
BLAKE2b-256 7f6a1ec3b19dcc5d16a4fb7bbf6821eba82c93f18b28fc32e9554b0956efcf01

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.14-cp39-cp39-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.14-cp39-cp39-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 15e8b9a473924041d7348c9e33ee634307f9e78ef10307af97a21a9f7c8200b7
MD5 7e585aa981f3406726eba4e27a47d117
BLAKE2b-256 248a701bf9145930f5e92c7e430a91e4a3c7f577b6b759cd497578a5cb7d37f1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page