Skip to main content

No project description provided

Project description

PyAutoCorpus

A python interface to the excellent AutoCorpus library.

Right now, it only supports the wiki markup textify function, which strips out markup. From my benchmarks, this ends up being ~40x faster than methods to strip markup using other libraries:

mwparserfromhell 0.208 sec/doc
wikitextparser 0.215 sec/doc
pyautocorpus 0.005 sec/doc

where:

  • mwparserfromhell is mwparserfromhell.parse(x).strip_code()
  • wikitextparser is wikitextparser.parse(x).plain_text()
  • pyautocorpus is pyautocorpus.Textifier().textify(x)

Installing

From pypi:

pip install pyautocorpus

From source:

Be sure to clone recursively:

git clone --recursive https://github.com/seanmacavaney/pyautocorpus.git

You will first need the pcre library installed.

python setup.py install

Usage

Example:

import pyautocorpus
textifier = pyautocorpus.Textifier()
textifier.textify("==Wiki Marked up text==\n [[Some Page|link text]] example.")
'Wiki Marked up text\n\n\n link text example.'

Known issues

  • Windows is not yet supported

Credits

AutoCorpus

Contributors to this repository:

  • Sean MacAvaney (University of Glasgow)
  • Thomas Jänich (University of Glasgow)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyautocorpus-0.1.18.tar.gz (10.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyautocorpus-0.1.18-cp314-cp314t-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (368.4 kB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

pyautocorpus-0.1.18-cp314-cp314-win_amd64.whl (6.9 kB view details)

Uploaded CPython 3.14Windows x86-64

pyautocorpus-0.1.18-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (367.2 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

pyautocorpus-0.1.18-cp314-cp314-macosx_10_15_universal2.whl (45.2 kB view details)

Uploaded CPython 3.14macOS 10.15+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.18-cp313-cp313-win_amd64.whl (6.7 kB view details)

Uploaded CPython 3.13Windows x86-64

pyautocorpus-0.1.18-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (367.1 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

pyautocorpus-0.1.18-cp313-cp313-macosx_10_13_universal2.whl (45.0 kB view details)

Uploaded CPython 3.13macOS 10.13+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.18-cp312-cp312-win_amd64.whl (6.7 kB view details)

Uploaded CPython 3.12Windows x86-64

pyautocorpus-0.1.18-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (367.1 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

pyautocorpus-0.1.18-cp312-cp312-macosx_10_13_universal2.whl (45.0 kB view details)

Uploaded CPython 3.12macOS 10.13+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.18-cp311-cp311-win_amd64.whl (6.6 kB view details)

Uploaded CPython 3.11Windows x86-64

pyautocorpus-0.1.18-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (366.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

pyautocorpus-0.1.18-cp311-cp311-macosx_10_9_universal2.whl (44.9 kB view details)

Uploaded CPython 3.11macOS 10.9+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.18-cp310-cp310-win_amd64.whl (6.6 kB view details)

Uploaded CPython 3.10Windows x86-64

pyautocorpus-0.1.18-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (366.4 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

pyautocorpus-0.1.18-cp310-cp310-macosx_10_9_universal2.whl (44.9 kB view details)

Uploaded CPython 3.10macOS 10.9+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.18-cp39-cp39-win_amd64.whl (6.7 kB view details)

Uploaded CPython 3.9Windows x86-64

pyautocorpus-0.1.18-cp39-cp39-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (366.2 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

pyautocorpus-0.1.18-cp39-cp39-macosx_10_9_universal2.whl (44.9 kB view details)

Uploaded CPython 3.9macOS 10.9+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.18-cp38-cp38-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (366.2 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file pyautocorpus-0.1.18.tar.gz.

File metadata

  • Download URL: pyautocorpus-0.1.18.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.6

File hashes

Hashes for pyautocorpus-0.1.18.tar.gz
Algorithm Hash digest
SHA256 a0c7b9f01f5d00bae0cae2f10a6cf10ccb2356663c608179356efb3db896867b
MD5 f23522c4b0dcafa00f3c5652846a2dcd
BLAKE2b-256 ef4d27cac031338ccd9fc6aecc52321ed4193848142c19bce906df21a5cec0d2

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.18-cp314-cp314t-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.18-cp314-cp314t-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e55592d6fd27fbe75347b04d9d344fbc52065880014a87c5094fd4392f7cd40a
MD5 b3a0d4c8067ac2e7f49e70a0dba6a639
BLAKE2b-256 5542720e3119389e3cd29fa869bfc468bc3b1bb6f590a773bc7cdc41b4d1a982

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.18-cp314-cp314-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.18-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 493db91fafff47410bc4f73c0f856bdc670ff545f69383d1db373cf4e7c4c2d6
MD5 05be3272d3163481eb4fec4c3887f3d2
BLAKE2b-256 400dd555a09973c3b2563b2a63227e03d50d73fba092f6d2b024d910a0c0c0e4

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.18-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.18-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 57ddf744b1a1a791a706efa8f7393744500df3f9c9e932843a26a9642618bb40
MD5 c8b7ae9df5ee68a2fdeb20c7eb4287ca
BLAKE2b-256 a4fad0c54db8d9a8486f162d09058a2e4076842db5253148e422f7d18ce242bb

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.18-cp314-cp314-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.18-cp314-cp314-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 be62ab5ac967b19b808acfad8191fd314ed594844554c6e0f80f5162fc3263a6
MD5 b156e18602aff2d313ff0205891e2cc6
BLAKE2b-256 7719750055baae754a4bf7bcd654322dfc3a04b2db0800b972488c70871df74d

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.18-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.18-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 0f0908e3a84bab6791ae4c6824d084a71fbebcc2e797e4b66da7705dfdcb3034
MD5 6198048f958bc2d167bdf6132b922d61
BLAKE2b-256 0446c697d7f03cc8d9bab4bfc1ce47bd5ef8bff0870cc28afc5ae615baeb1f1c

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.18-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.18-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 64a32ad49248bf58ec73613535644601a1cba1f21850d540492fd57c6464911f
MD5 5cbddd66449c940a50ced8819b2ab805
BLAKE2b-256 916cea871041dcf1f675e67a3de2435369327e523da4a8bace9f27a713b2b0e4

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.18-cp313-cp313-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.18-cp313-cp313-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 e4897d722242d5c3bece25aec1219c6ef2835731adaea6a22629a982149ab4d8
MD5 e801a0a3baaf1b3b22039836ebf0dbe4
BLAKE2b-256 095a0527afa5bc0432422ec9a03d7a36591b0aa0fb1c4d9b537dfc5eeb723d02

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.18-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.18-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 ac4395ebab38f32b011243d59d1a53dcab97590fcb71856c4f559ec40dac8a7f
MD5 12aa63322a33ebd29dd523722250e970
BLAKE2b-256 6afbd2a04f41dd7f5e19d341ab4a384d95feb7dd8f8e3f1f1a03f084bbaf55b6

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.18-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.18-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d878e5309206b6fb5b95b760471f1c8561cddeba7410a7958e1168e944b87006
MD5 9137fc1ca1c9c2ab8a18e279f93f11ba
BLAKE2b-256 3d2239ddb8acd746687ccc923a67b920c7eadfc3c8ab153135215286734a0049

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.18-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.18-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 7bdb4f8658a3e5ebdc184ad13983e154339ea909e4821347dd718f558e0be7fc
MD5 eefd9bd5b090e0a53f50081031685b7f
BLAKE2b-256 9fb4e1116be16c02f88e9ddbe12222b0be477d44c9fa38fb3569590d90253230

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.18-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.18-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 d7820760cde5c0abdb56274ef1020df12c85d1364e359609dfe024539ed653b8
MD5 bc295fe77f61335d19775b49554319b8
BLAKE2b-256 146c4f8f785c78c232e3246592fd5e656b483edc1660d98901c4d4fa6ba12269

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.18-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.18-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 3bac6b22a3580b6179176222ef3f5c65a7ef8da4fd4220daff94272bb3dd1eed
MD5 4485859644f5b1190eabaf3446176eab
BLAKE2b-256 ac1f93563bc94543c61d7093d62dd96ada884dec1b026a9a76ed65c7292f0c2d

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.18-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.18-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 2ab565242662dd4d5fe47180bec2ca0318ebfb1ec9edd25b70c51c175c39a82f
MD5 be4d9d3fabc0dfb0c657c1934e501def
BLAKE2b-256 6afb1c31d68531a407401fdaf5f4ffb0743c89bb28a16ad4a64381f41677653f

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.18-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.18-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 c6b38a3eabcde3b0d773830cad7db883e23c91f372c8a36d4c16c79db8673fc7
MD5 919dd227905bc3c6e7dc8802f840b801
BLAKE2b-256 7cfe21091f597061d3501e8bc474f8c11cd6aa54a7f6b63b94048ad9d37dbbd6

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.18-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.18-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 44ab11fe0a66b22bf24b17b55febf5b00f1b80e8a881c59ba1e6233c85f3ef15
MD5 f137e49771e957b13e0ee688976218d4
BLAKE2b-256 363eb40418b8cb3304675d3c80ba909846ae7cbe889e36a506d48f0d9f406356

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.18-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.18-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 b6c174ae31264fad69eb4ff17e097c9df0cc0e27d743b13afe38a64228d24785
MD5 f95df20fa949b16a9953be5295bda092
BLAKE2b-256 74ebce4a10f97a63240618cdb0115675a473a4f1e46a50e2580f3b6bf5437d0d

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.18-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.18-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 8feb05e217b112faca133068062537feea4f8b6bdc24bbf85c51ccd52986eaea
MD5 7dc3910d11a66a9effb7271d5d172914
BLAKE2b-256 1fe5711a2867456eb87e243feaa02a3a163a813399af42b4100ed4e4428c6da9

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.18-cp39-cp39-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.18-cp39-cp39-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 bd64d3bc29ba93387d4ecbaa85b9726ec4ba7a480a117b92b0563a579697b970
MD5 20d7922b5ac5f317991457e8eb5a4e42
BLAKE2b-256 59d263368a62f969d3b0dfa33eaa01e5c672f0ce7032cc2556d57a5cf6239642

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.18-cp39-cp39-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.18-cp39-cp39-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 515f60dcfa559fe41709db9d329704bd93c52f197ec77cee894e098292680009
MD5 1a926bdcb1e6d365d58508455b98b401
BLAKE2b-256 74ba211d9c03db6bc280e2d4660aa66954d6e68eed9bf634c738bb6293411a69

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.18-cp38-cp38-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.18-cp38-cp38-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e18cccfaf41625305ddae017d1040eb781ed497b78a558106bf051370569d9f8
MD5 e24e1592030894932862b034bc77acb4
BLAKE2b-256 0482dc8680eea640c64b40b0639c8464244784d4b7c6f0e29ce3401b55c3e1b2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page