Skip to main content

No project description provided

Project description

PyAutoCorpus

A python interface to the excellent AutoCorpus library.

Right now, it only supports the wiki markup textify function, which strips out markup. From my benchmarks, this ends up being ~40x faster than methods to strip markup using other libraries:

mwparserfromhell 0.208 sec/doc
wikitextparser 0.215 sec/doc
pyautocorpus 0.005 sec/doc

where:

  • mwparserfromhell is mwparserfromhell.parse(x).strip_code()
  • wikitextparser is wikitextparser.parse(x).plain_text()
  • pyautocorpus is pyautocorpus.Textifier().textify(x)

Installing

From pypi:

pip install pyautocorpus

From source:

Be sure to clone recursively:

git clone --recursive https://github.com/seanmacavaney/pyautocorpus.git

You will first need the pcre library installed.

python setup.py install

Usage

Example:

import pyautocorpus
textifier = pyautocorpus.Textifier()
textifier.textify("==Wiki Marked up text==\n [[Some Page|link text]] example.")
'Wiki Marked up text\n\n\n link text example.'

Known issues

  • Windows is not yet supported

Credits

AutoCorpus

Contributors to this repository:

  • Sean MacAvaney (University of Glasgow)
  • Thomas Jänich (University of Glasgow)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyautocorpus-0.1.17.tar.gz (10.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyautocorpus-0.1.17-cp314-cp314-win_amd64.whl (6.9 kB view details)

Uploaded CPython 3.14Windows x86-64

pyautocorpus-0.1.17-cp314-cp314-macosx_10_15_universal2.whl (45.2 kB view details)

Uploaded CPython 3.14macOS 10.15+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.17-cp313-cp313-win_amd64.whl (6.7 kB view details)

Uploaded CPython 3.13Windows x86-64

pyautocorpus-0.1.17-cp313-cp313-manylinux_2_28_x86_64.whl (367.1 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

pyautocorpus-0.1.17-cp313-cp313-macosx_10_13_universal2.whl (45.0 kB view details)

Uploaded CPython 3.13macOS 10.13+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.17-cp312-cp312-win_amd64.whl (6.7 kB view details)

Uploaded CPython 3.12Windows x86-64

pyautocorpus-0.1.17-cp312-cp312-manylinux_2_28_x86_64.whl (367.1 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

pyautocorpus-0.1.17-cp312-cp312-macosx_10_13_universal2.whl (45.0 kB view details)

Uploaded CPython 3.12macOS 10.13+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.17-cp311-cp311-win_amd64.whl (6.6 kB view details)

Uploaded CPython 3.11Windows x86-64

pyautocorpus-0.1.17-cp311-cp311-manylinux_2_28_x86_64.whl (366.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

pyautocorpus-0.1.17-cp311-cp311-macosx_10_9_universal2.whl (44.9 kB view details)

Uploaded CPython 3.11macOS 10.9+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.17-cp310-cp310-win_amd64.whl (6.6 kB view details)

Uploaded CPython 3.10Windows x86-64

pyautocorpus-0.1.17-cp310-cp310-manylinux_2_28_x86_64.whl (366.4 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

pyautocorpus-0.1.17-cp310-cp310-macosx_10_9_universal2.whl (44.9 kB view details)

Uploaded CPython 3.10macOS 10.9+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.17-cp39-cp39-win_amd64.whl (6.7 kB view details)

Uploaded CPython 3.9Windows x86-64

pyautocorpus-0.1.17-cp39-cp39-manylinux_2_28_x86_64.whl (366.2 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.28+ x86-64

pyautocorpus-0.1.17-cp39-cp39-macosx_10_9_universal2.whl (44.9 kB view details)

Uploaded CPython 3.9macOS 10.9+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.17-cp38-cp38-manylinux_2_28_x86_64.whl (366.2 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.28+ x86-64

pyautocorpus-0.1.17-cp37-cp37m-manylinux_2_28_x86_64.whl (365.8 kB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.28+ x86-64

File details

Details for the file pyautocorpus-0.1.17.tar.gz.

File metadata

  • Download URL: pyautocorpus-0.1.17.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for pyautocorpus-0.1.17.tar.gz
Algorithm Hash digest
SHA256 29442d2166c8876970178f5fe4a53911da3883f9760d8a3352c8a7ce0d4486f4
MD5 2fd4d1a2a4a3cfba45b6e2867b58b13d
BLAKE2b-256 70ddee5d99829870f072e00e5f9b5fb199c6b3e373f63a77f1866713984582e7

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.17-cp314-cp314-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.17-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 72fd72ac09328cc8ab9929cd4e386ec6eb700420b04a3fa635820d7be7a5bb4a
MD5 8c764cbbf15cdbda318f8932e9ea64a8
BLAKE2b-256 0076e153d5351602328d5dfd1d0aa72f9751c96cfea07d82e34806713a399d0b

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.17-cp314-cp314-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.17-cp314-cp314-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 eee78584736fe3e8bbebd1bafa9185accf6825fa44c5bab071e26560ff96f484
MD5 ef0ad925c05489384893c08a8e0daec2
BLAKE2b-256 6e0e8b37ef98d3f9ec2b9debf15bdab50644a0a09919d159d9fab25c9b75959c

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.17-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.17-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 0f756d64f3dadc7fb75ebe0514fe824ab23377a0e93248ce096d91c0c7265184
MD5 193388d4e881edbbbaf8b5f37c085800
BLAKE2b-256 ae7bf5de25c36050fd94dbbc042cb715da71b72a2c4e7baffe2a99b0012b84d7

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.17-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.17-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 31305a339c0462a8967aec59f02d8eb83205cace9288cda80f2582026fb1c2cb
MD5 e95a13d0bae449a2f979c940fb4ad29b
BLAKE2b-256 fe5d7e71c0fa5d6f04aa05fdc9c2fdcd9feedde8bff329674b3bf1908cb07cbb

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.17-cp313-cp313-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.17-cp313-cp313-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 80a65d86c7606408fa70be6fb618f7b4653bba7f4e373cc28ec62c335179dbf4
MD5 577b92429386eea2b28f3d4bc7aa7656
BLAKE2b-256 5c7575824acc364315f3a13fa334c07b487291f4cb00e64daabfd29c2f5ea982

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.17-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.17-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 fdfd3b9463877812c00d8fe65ab55202e24fa41b5916dd1d895112f5ff84edca
MD5 215f5e0eb8fe21e5bfcd4831bc8e7ee7
BLAKE2b-256 aff693dd51764556676d586079e8d0bff8e560cd6ca1d7e5a46644a6050e8981

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.17-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.17-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e2f31d2de1d6d9070a8b383279990490cec024de11e58b35e5c48c57ac218562
MD5 99ed450f107b659d2d93103192aa074e
BLAKE2b-256 5a4284b2851cbe69518ae587e9ed86dfdd5d83dcfa98b769420c11c37b7dcb6a

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.17-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.17-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 c6bdfed71a191d0a1122e311b64575c6d1bb0b97f7ca285122a664e23eb1544d
MD5 da1ab3bb586ae4d5ebe56db095d86ea4
BLAKE2b-256 a679f2d72746c9254b55e56bd04f44cb1a2e4e06a4337c422d59c5f0aefda629

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.17-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.17-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 de758951185c9a5ec48d02ce11718083966580cc3c39014df125864f126e25b3
MD5 2fa4c09e0bf7953ed63b9d4bd0ae7e07
BLAKE2b-256 811caf662651b04a4f5bf76869b5ba1d34c138ee0d3ccdbae2766b7731fdd91b

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.17-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.17-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 97b395281a7121362a731bc955bcfd076887416d889378a1fe2a7ec43c2555ac
MD5 edb8840d9e24b67ff2bd1f55cdfda252
BLAKE2b-256 2aa28bdbe933bf9fa6dfe684a37a5defa2791cb44a1b26eb80f3ce62ba160e8d

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.17-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.17-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 490eb1ca8069b5d44c36d694d0943ce0f91d3470929bcf8cd628746c59bc4262
MD5 792df450d2c093fe6b4146dbfb09bd7b
BLAKE2b-256 a96088cd6c9b9b843262ffb46b37cb97c7f91d266345618293873e4527b53323

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.17-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.17-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 516bbc4e6edfdeae645fd9113b6277aa64a0e31e782f1cbfe052186675acf630
MD5 e99c5a479a791f68ce423ebaeae690e1
BLAKE2b-256 772a20c60f6cc27ce90f763a8f4471d26ef31ebd19dea405fb824f1cf14c95a3

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.17-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.17-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 17135347d9fb4bf499ec3d14bcbe686002c68b7664bda66966cfded20ce94132
MD5 ff8988f6ad19d2a0bc8b2c0b57bb8e8e
BLAKE2b-256 975a733597ffef9acadcc7f80bfce4463bcfc9700fb6dde997d41caf99cf98b5

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.17-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.17-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 3583d8c1e553d378c9cd248c246bedaaff0e9087820921edaaed76f819b6e85d
MD5 a88da25189e781fc054673594ec0f7d0
BLAKE2b-256 2121e5136406c95fe0d5913e0b633f57aa1f21a716020633e57de3e1899587e3

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.17-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.17-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 0d9dce71e3045e82ead76a3e2d618550dee0deb46f198dc3042672c23c8888bd
MD5 7a003c7d7e6e00775040ab35e4acf751
BLAKE2b-256 ff06c97844195f6018ca9ce490d71f29e8f0ec921772ab77d551dcc5469a2523

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.17-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.17-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 95e6709ed4cf73de1127bd53a27760cdb94dac0fa4c839a6689e685b10384aed
MD5 3593a807d0da2bdd93ef3f90093f86a1
BLAKE2b-256 6fb0634846a7495fd06adad76978a40f077008f56570c97f3c3603f2f9ce01ee

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.17-cp39-cp39-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.17-cp39-cp39-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 12870f9b21d62674541f7ec368312de8eb3764f1ff311c64e4259cd4bb819506
MD5 c420273b05b71f38fa69086c4b270ff6
BLAKE2b-256 31a0fb325b8243cce99515ed8bddba07c214504c168385b40bb4c3d00c060766

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.17-cp38-cp38-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.17-cp38-cp38-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 57cf439889bb9c2308030179d913c7ef3ad7d3ddee086aaccee2b9e59ed31454
MD5 e72ead21a6417e72f9a1a8491b89425f
BLAKE2b-256 2f3755f7b2e06a2dee5ce29e22b78796c1b751ec3ae00a32a2653d7466f22c28

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.17-cp37-cp37m-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.17-cp37-cp37m-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 26cc13c466d4aecb748510ea127c08203d417e9148c0c851b6df38bcf2443a60
MD5 d4a429233a427ebadd85c5f38c21f2f7
BLAKE2b-256 ce4d58e83633cbd3b62de749cfa9c649f9ec7b942b01231d581362d2a9dcd293

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page