Skip to main content

No project description provided

Project description

PyAutoCorpus

A python interface to the excellent AutoCorpus library.

Right now, it only supports the wiki markup textify function, which strips out markup. From my benchmarks, this ends up being ~40x faster than methods to strip markup using other libraries:

mwparserfromhell 0.208 sec/doc
wikitextparser 0.215 sec/doc
pyautocorpus 0.005 sec/doc

where:

  • mwparserfromhell is mwparserfromhell.parse(x).strip_code()
  • wikitextparser is wikitextparser.parse(x).plain_text()
  • pyautocorpus is pyautocorpus.Textifier().textify(x)

Installing

From pypi:

pip install pyautocorpus

From source:

Be sure to clone recursively:

git clone --recursive https://github.com/seanmacavaney/pyautocorpus.git

You will first need the pcre library installed.

python setup.py install

Usage

Example:

import pyautocorpus
textifier = pyautocorpus.Textifier()
textifier.textify("==Wiki Marked up text==\n [[Some Page|link text]] example.")
'Wiki Marked up text\n\n\n link text example.'

Known issues

  • Windows is not yet supported

Credits

AutoCorpus

Contributors to this repository:

  • Sean MacAvaney (University of Glasgow)
  • Thomas Jänich (University of Glasgow)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyautocorpus-0.1.12.tar.gz (10.8 kB view details)

Uploaded Source

Built Distributions

pyautocorpus-0.1.12-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (197.7 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

pyautocorpus-0.1.12-pp39-pypy39_pp73-manylinux_2_17_i686.manylinux2014_i686.whl (196.4 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ i686

pyautocorpus-0.1.12-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (197.7 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

pyautocorpus-0.1.12-pp38-pypy38_pp73-manylinux_2_17_i686.manylinux2014_i686.whl (196.4 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ i686

pyautocorpus-0.1.12-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (198.9 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

pyautocorpus-0.1.12-pp37-pypy37_pp73-manylinux_2_17_i686.manylinux2014_i686.whl (197.5 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ i686

pyautocorpus-0.1.12-cp311-cp311-win_amd64.whl (7.1 kB view details)

Uploaded CPython 3.11 Windows x86-64

pyautocorpus-0.1.12-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (380.1 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

pyautocorpus-0.1.12-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl (373.3 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ i686

pyautocorpus-0.1.12-cp311-cp311-macosx_10_9_universal2.whl (46.8 kB view details)

Uploaded CPython 3.11 macOS 10.9+ universal2 (ARM64, x86-64)

pyautocorpus-0.1.12-cp310-cp310-win_amd64.whl (7.1 kB view details)

Uploaded CPython 3.10 Windows x86-64

pyautocorpus-0.1.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (379.9 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

pyautocorpus-0.1.12-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl (373.2 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ i686

pyautocorpus-0.1.12-cp310-cp310-macosx_11_0_x86_64.whl (24.6 kB view details)

Uploaded CPython 3.10 macOS 11.0+ x86-64

pyautocorpus-0.1.12-cp39-cp39-win_amd64.whl (7.2 kB view details)

Uploaded CPython 3.9 Windows x86-64

pyautocorpus-0.1.12-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (379.8 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

pyautocorpus-0.1.12-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl (373.0 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ i686

pyautocorpus-0.1.12-cp39-cp39-macosx_11_0_x86_64.whl (24.6 kB view details)

Uploaded CPython 3.9 macOS 11.0+ x86-64

pyautocorpus-0.1.12-cp38-cp38-win_amd64.whl (7.3 kB view details)

Uploaded CPython 3.8 Windows x86-64

pyautocorpus-0.1.12-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (379.8 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

pyautocorpus-0.1.12-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl (373.0 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ i686

pyautocorpus-0.1.12-cp38-cp38-macosx_11_0_x86_64.whl (24.6 kB view details)

Uploaded CPython 3.8 macOS 11.0+ x86-64

pyautocorpus-0.1.12-cp37-cp37m-win_amd64.whl (7.2 kB view details)

Uploaded CPython 3.7m Windows x86-64

pyautocorpus-0.1.12-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (379.5 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

pyautocorpus-0.1.12-cp37-cp37m-manylinux_2_17_i686.manylinux2014_i686.whl (372.6 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ i686

pyautocorpus-0.1.12-cp37-cp37m-macosx_11_0_x86_64.whl (24.6 kB view details)

Uploaded CPython 3.7m macOS 11.0+ x86-64

File details

Details for the file pyautocorpus-0.1.12.tar.gz.

File metadata

  • Download URL: pyautocorpus-0.1.12.tar.gz
  • Upload date:
  • Size: 10.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for pyautocorpus-0.1.12.tar.gz
Algorithm Hash digest
SHA256 146911b265eb4e771e9d49516ff65a91a4f25464e3668cea6f8885f2d298ad51
MD5 45658c37a705dfd593625212ea35fa1d
BLAKE2b-256 2cb39fb7f5a8b0f590c2b88afc247fa8f60ba083b2ce2c72d3a90d754abcd6d5

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.12-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.12-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8719ec386f11dc0eb2821989d0e7d0d9909d4603a784919efa5f9d97fa03f942
MD5 f4bee61ee15012484ba426f5e997806c
BLAKE2b-256 85031abe0e850221506be7aea0643896d39426d48ff268a3da91ac381d6d32dc

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.12-pp39-pypy39_pp73-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.12-pp39-pypy39_pp73-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 96dbb42560a2a8a669d40411065ac5325825a0edf55379b48d1c9b6c20a9c62b
MD5 d9619da009d17a9365ca889a3f50b46b
BLAKE2b-256 d4fd21e09fe9a3e332d03a996898097db97b3baa29edf377e6226537db78f920

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.12-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.12-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b023da1484be5095c47bc0054282b8330bc68444a2a4f96b46dc2817ba8ba9b4
MD5 580cd56d300d42ee4461fdaed1ad3400
BLAKE2b-256 68f18aee1607349cf3f6245ac67bce0b8310bea68a4be053585d913b9a94115e

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.12-pp38-pypy38_pp73-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.12-pp38-pypy38_pp73-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 4421f3c967826885250d4679b343da9ba9df7721aa6379f8d5e5b1c8f8ca1536
MD5 02fe75196874a8f84eb51f3b44e92e85
BLAKE2b-256 b4b63ab2b7ea02d7b4a7f9f0bbf371e52173d13f3bb734ca31d39ed55f1da82e

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.12-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.12-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8256108e354a4f2929b02052371b86594420be67cc6a2c00fe99ea534e561be7
MD5 ff636e85e3d734fa1ba5475dab8419b5
BLAKE2b-256 18a917eb105288344397b39595757b576afbdbcada489dc632f6bf532b4dd9ea

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.12-pp37-pypy37_pp73-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.12-pp37-pypy37_pp73-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 efaafb1579c9de5e66636e7bd56cc0891e8864ffd8bbed926b12d70dd632b101
MD5 99142c7ce1d1d0ca249f231ba1972dff
BLAKE2b-256 2be58ad855f800fa5cef79a780deac576c458aa1fa33589252938393357675eb

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.12-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.12-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 a77c2e8be053227534e1bfd186d00046b5b184ae1f811fbc874c3dad126ef6b4
MD5 2d4e4c22727d46504a95a6f49cb29b25
BLAKE2b-256 6bdd1c2e0601b747c7c61086cc2349e52d9413cc332fdc0f5cc4f98189f76fd7

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.12-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.12-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c4e1b736b52ec0e6c6c3528932f046c52b3162c0cf9f557c7cf869c7103f77fa
MD5 ab20f806f10776bf2af22fd4cf4ff82f
BLAKE2b-256 56dd8450f09ef4b597f505bc47460c6179fd700e7e53c1d9381f923384282ea0

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.12-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.12-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 17d7e75795a65fccfd0ab4ceef301bbdbd8470ac99cf404a27bb91129d3f2c60
MD5 26ed7fd8b4037619e325e71bff79aa2d
BLAKE2b-256 77aadfd6960c94866d43f4cdf49a54066caf90f81432f22aaf7bec26208ffae8

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.12-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.12-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 0e3aafe61bb3c85fa176613fcc2f1c2fbeab4d05f3d0f4edd63a42cbdff28d97
MD5 e5ad39402f3ff1e2bff75e6f9123a1f0
BLAKE2b-256 017e7427ab14fdce4a87f4b7dec5dc03c90f3d2741cea7e5f72e5f49ae06ee10

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.12-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.12-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 8f516c5eb3f05c17b6fce93a17e7f7833764788962fc14f81af478da07e31654
MD5 25ec1a59ed03eb136e064bb373cf3d21
BLAKE2b-256 eaaaf13f490c8fd7eb29dbc5cc1932fc9a7c6be56648c789ab08d2fd37e75353

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 82e0d515116df20d898462ff486021c178257da6c484c8c1e80211ddb3f418e9
MD5 630c8c642b2c13cc892b1520c79b3bb7
BLAKE2b-256 a875fe3471efd4fd555f2f8f5f97b085fd453b8f52b00bf050fa43b27e252efb

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.12-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.12-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 3a6355beb18945127c0c6e7906410db3e9aff29a4eba710ceae774bcdbfc6012
MD5 1c2ede7e3ef5a9b29aa441eddabdfd2b
BLAKE2b-256 9da0d7242b0ffc8c04c8618d5995f9ce5e73701e67ad0db22d0246e241a51505

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.12-cp310-cp310-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.12-cp310-cp310-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 a8c9e7be7cf646e89d7a3b1d13ac52ac2de838f2b158312c9016e9e7b3eb8199
MD5 4bf3cbcbde9fa581172bca36586ac63a
BLAKE2b-256 c6a0c5d105e8873cb1f9fb3dfe93457ccc4b1e72f79e32abb7ad550ccf431356

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.12-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.12-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 364e86f43b56696428d413308c1db638c24086d052ac36314a4dcb5e8c6f80a4
MD5 5ccb5e133373d0c0eed42a5b2c9de8bf
BLAKE2b-256 b358532fc9438dfb100d63527065f96434f7a0844e726c5fa922d168bf585b6b

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.12-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.12-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a91c6ad1d3a959fc7d48c5f3b15d3369bb391ff8656030c6c7c1ac55567484c9
MD5 6499e8707913f00da1f53ce0a671b77f
BLAKE2b-256 2b88747ea446a551f5c4b960ae0eea807e7ddc99f00ca9e0825f83eaba2f3b76

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.12-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.12-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 1ea1639f439939ca64757b7c8e15d58990381eb2a1a8a5c6cef3668f85c1765e
MD5 09f92ca1b5e1888c57a8633541bd7ba9
BLAKE2b-256 c1a4a444645b3669dfc6fb100d35af4b0db8297967c9ed1d5ec3ea9de00c3dfb

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.12-cp39-cp39-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.12-cp39-cp39-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 24744fda18c6bdf51d1d47d395c4bcb2118c77f48eefa910383cc7a193738450
MD5 8ea269dfc7abcd16c32f8a3ce5853949
BLAKE2b-256 7732815f782f70c2ac2598b802c31ff00a0e6d4bfddff03fb0cc5875e4b27656

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.12-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.12-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 af093f1011afae8c3f8f5b65529c4d376ae47647440c808936e6d915293c53e7
MD5 bf6d59a6da7d55b9cea162368fb7b834
BLAKE2b-256 b63cc890e83d6271cf1b5b62145dab0b353e3394cada17cfc33f1653c6f96000

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.12-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.12-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a18e907fea5a216564dac4b8c2d183a85488eab094759156fe4a43c875326189
MD5 7b799cbfc52cbe67779939b6a95c37bb
BLAKE2b-256 5540667af1ffa3391cb1f1bef415f26e7528a4e0ccb472a01ec016b60ac1ed7d

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.12-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.12-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 cd58e2fbcb42ed23762b5f1cd576d683cef027d4c0002836d8255e9a64086415
MD5 085652be7550b4e023cedf7e6566ced2
BLAKE2b-256 b6bef4d99c31475256e18a9ddf4b183159d233bc70baa16919692870509f7861

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.12-cp38-cp38-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.12-cp38-cp38-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 ad980f077adb6f29693b6c199a2edd0e1d74f2978ded3e579f96bede75c7a892
MD5 0fc81d26bb140c7e86366ef01ad2c7ab
BLAKE2b-256 ecd063cd62720624c4ef026e580bc8f3a1428cbfd12446ce342c4537bf4892c5

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.12-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.12-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 94b3bfe131ca271fe62b6387fff6aaedb2abf2ed240c945b24240337b399e7ed
MD5 6b1fbf9542f63a573afd01b0d568b16d
BLAKE2b-256 31d41a88ec10f8e22aec6691687b3b6acb95c6525068b6d711573f045cecd4c6

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.12-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.12-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b4189f84794a1d4d83c9b2ff4a10fa73772d950990d7b888ac4d5ebf813885fd
MD5 840d71cadc8d7fc818f1975a0132fd36
BLAKE2b-256 5a658334b2b2902f2f0cb8a009e3c924f9f384f4b99b21d62d26351ef4e8afbe

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.12-cp37-cp37m-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.12-cp37-cp37m-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 71571657bfb273e3bdfa94069f413255e3f58e2e3918af535cee3effd5ddcdfd
MD5 4417e82a4f6c4dc053b76d62459cc145
BLAKE2b-256 bfc308376d9d12828616367aeb8b4da69232550985be9f0deafdddc4744c0616

See more details on using hashes here.

File details

Details for the file pyautocorpus-0.1.12-cp37-cp37m-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for pyautocorpus-0.1.12-cp37-cp37m-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 2a2e54197c163ddbece73d2d3131e6c2b87cd5fc7e47c2463ed22632d8239a54
MD5 8c4465976d28ccd12b6561d75a9a4070
BLAKE2b-256 fb3b4e4a0f7dd7d52b0f365fb696577a5f5c54f2b8b1725e130117e4a0643045

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page