No project description provided
Project description
PyAutoCorpus
A python interface to the excellent AutoCorpus library.
Right now, it only supports the wiki markup textify
function, which strips out
markup. From my benchmarks, this ends up being ~40x faster than methods to strip
markup using other libraries:
mwparserfromhell 0.208 sec/doc
wikitextparser 0.215 sec/doc
pyautocorpus 0.005 sec/doc
where:
mwparserfromhell
ismwparserfromhell.parse(x).strip_code()
wikitextparser
iswikitextparser.parse(x).plain_text()
pyautocorpus
ispyautocorpus.Textifier().textify(x)
Installing
From pypi:
pip install pyautocorpus
From source:
Be sure to clone recursively:
git clone --recursive https://github.com/seanmacavaney/pyautocorpus.git
You will first need the pcre
library installed.
python setup.py install
Usage
Example:
import pyautocorpus
textifier = pyautocorpus.Textifier()
textifier.textify("==Wiki Marked up text==\n [[Some Page|link text]] example.")
'Wiki Marked up text\n\n\n link text example.'
Known issues
- Windows is not yet supported
Credits
Contributors to this repository:
- Sean MacAvaney (University of Glasgow)
- Thomas Jänich (University of Glasgow)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyautocorpus-0.1.12.tar.gz
(10.8 kB
view hashes)
Built Distributions
Close
Hashes for pyautocorpus-0.1.12-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8719ec386f11dc0eb2821989d0e7d0d9909d4603a784919efa5f9d97fa03f942 |
|
MD5 | f4bee61ee15012484ba426f5e997806c |
|
BLAKE2b-256 | 85031abe0e850221506be7aea0643896d39426d48ff268a3da91ac381d6d32dc |
Close
Hashes for pyautocorpus-0.1.12-pp39-pypy39_pp73-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 96dbb42560a2a8a669d40411065ac5325825a0edf55379b48d1c9b6c20a9c62b |
|
MD5 | d9619da009d17a9365ca889a3f50b46b |
|
BLAKE2b-256 | d4fd21e09fe9a3e332d03a996898097db97b3baa29edf377e6226537db78f920 |
Close
Hashes for pyautocorpus-0.1.12-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b023da1484be5095c47bc0054282b8330bc68444a2a4f96b46dc2817ba8ba9b4 |
|
MD5 | 580cd56d300d42ee4461fdaed1ad3400 |
|
BLAKE2b-256 | 68f18aee1607349cf3f6245ac67bce0b8310bea68a4be053585d913b9a94115e |
Close
Hashes for pyautocorpus-0.1.12-pp38-pypy38_pp73-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4421f3c967826885250d4679b343da9ba9df7721aa6379f8d5e5b1c8f8ca1536 |
|
MD5 | 02fe75196874a8f84eb51f3b44e92e85 |
|
BLAKE2b-256 | b4b63ab2b7ea02d7b4a7f9f0bbf371e52173d13f3bb734ca31d39ed55f1da82e |
Close
Hashes for pyautocorpus-0.1.12-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8256108e354a4f2929b02052371b86594420be67cc6a2c00fe99ea534e561be7 |
|
MD5 | ff636e85e3d734fa1ba5475dab8419b5 |
|
BLAKE2b-256 | 18a917eb105288344397b39595757b576afbdbcada489dc632f6bf532b4dd9ea |
Close
Hashes for pyautocorpus-0.1.12-pp37-pypy37_pp73-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | efaafb1579c9de5e66636e7bd56cc0891e8864ffd8bbed926b12d70dd632b101 |
|
MD5 | 99142c7ce1d1d0ca249f231ba1972dff |
|
BLAKE2b-256 | 2be58ad855f800fa5cef79a780deac576c458aa1fa33589252938393357675eb |
Close
Hashes for pyautocorpus-0.1.12-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a77c2e8be053227534e1bfd186d00046b5b184ae1f811fbc874c3dad126ef6b4 |
|
MD5 | 2d4e4c22727d46504a95a6f49cb29b25 |
|
BLAKE2b-256 | 6bdd1c2e0601b747c7c61086cc2349e52d9413cc332fdc0f5cc4f98189f76fd7 |
Close
Hashes for pyautocorpus-0.1.12-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c4e1b736b52ec0e6c6c3528932f046c52b3162c0cf9f557c7cf869c7103f77fa |
|
MD5 | ab20f806f10776bf2af22fd4cf4ff82f |
|
BLAKE2b-256 | 56dd8450f09ef4b597f505bc47460c6179fd700e7e53c1d9381f923384282ea0 |
Close
Hashes for pyautocorpus-0.1.12-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 17d7e75795a65fccfd0ab4ceef301bbdbd8470ac99cf404a27bb91129d3f2c60 |
|
MD5 | 26ed7fd8b4037619e325e71bff79aa2d |
|
BLAKE2b-256 | 77aadfd6960c94866d43f4cdf49a54066caf90f81432f22aaf7bec26208ffae8 |
Close
Hashes for pyautocorpus-0.1.12-cp311-cp311-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0e3aafe61bb3c85fa176613fcc2f1c2fbeab4d05f3d0f4edd63a42cbdff28d97 |
|
MD5 | e5ad39402f3ff1e2bff75e6f9123a1f0 |
|
BLAKE2b-256 | 017e7427ab14fdce4a87f4b7dec5dc03c90f3d2741cea7e5f72e5f49ae06ee10 |
Close
Hashes for pyautocorpus-0.1.12-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f516c5eb3f05c17b6fce93a17e7f7833764788962fc14f81af478da07e31654 |
|
MD5 | 25ec1a59ed03eb136e064bb373cf3d21 |
|
BLAKE2b-256 | eaaaf13f490c8fd7eb29dbc5cc1932fc9a7c6be56648c789ab08d2fd37e75353 |
Close
Hashes for pyautocorpus-0.1.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 82e0d515116df20d898462ff486021c178257da6c484c8c1e80211ddb3f418e9 |
|
MD5 | 630c8c642b2c13cc892b1520c79b3bb7 |
|
BLAKE2b-256 | a875fe3471efd4fd555f2f8f5f97b085fd453b8f52b00bf050fa43b27e252efb |
Close
Hashes for pyautocorpus-0.1.12-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3a6355beb18945127c0c6e7906410db3e9aff29a4eba710ceae774bcdbfc6012 |
|
MD5 | 1c2ede7e3ef5a9b29aa441eddabdfd2b |
|
BLAKE2b-256 | 9da0d7242b0ffc8c04c8618d5995f9ce5e73701e67ad0db22d0246e241a51505 |
Close
Hashes for pyautocorpus-0.1.12-cp310-cp310-macosx_11_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a8c9e7be7cf646e89d7a3b1d13ac52ac2de838f2b158312c9016e9e7b3eb8199 |
|
MD5 | 4bf3cbcbde9fa581172bca36586ac63a |
|
BLAKE2b-256 | c6a0c5d105e8873cb1f9fb3dfe93457ccc4b1e72f79e32abb7ad550ccf431356 |
Close
Hashes for pyautocorpus-0.1.12-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 364e86f43b56696428d413308c1db638c24086d052ac36314a4dcb5e8c6f80a4 |
|
MD5 | 5ccb5e133373d0c0eed42a5b2c9de8bf |
|
BLAKE2b-256 | b358532fc9438dfb100d63527065f96434f7a0844e726c5fa922d168bf585b6b |
Close
Hashes for pyautocorpus-0.1.12-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a91c6ad1d3a959fc7d48c5f3b15d3369bb391ff8656030c6c7c1ac55567484c9 |
|
MD5 | 6499e8707913f00da1f53ce0a671b77f |
|
BLAKE2b-256 | 2b88747ea446a551f5c4b960ae0eea807e7ddc99f00ca9e0825f83eaba2f3b76 |
Close
Hashes for pyautocorpus-0.1.12-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1ea1639f439939ca64757b7c8e15d58990381eb2a1a8a5c6cef3668f85c1765e |
|
MD5 | 09f92ca1b5e1888c57a8633541bd7ba9 |
|
BLAKE2b-256 | c1a4a444645b3669dfc6fb100d35af4b0db8297967c9ed1d5ec3ea9de00c3dfb |
Close
Hashes for pyautocorpus-0.1.12-cp39-cp39-macosx_11_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 24744fda18c6bdf51d1d47d395c4bcb2118c77f48eefa910383cc7a193738450 |
|
MD5 | 8ea269dfc7abcd16c32f8a3ce5853949 |
|
BLAKE2b-256 | 7732815f782f70c2ac2598b802c31ff00a0e6d4bfddff03fb0cc5875e4b27656 |
Close
Hashes for pyautocorpus-0.1.12-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | af093f1011afae8c3f8f5b65529c4d376ae47647440c808936e6d915293c53e7 |
|
MD5 | bf6d59a6da7d55b9cea162368fb7b834 |
|
BLAKE2b-256 | b63cc890e83d6271cf1b5b62145dab0b353e3394cada17cfc33f1653c6f96000 |
Close
Hashes for pyautocorpus-0.1.12-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a18e907fea5a216564dac4b8c2d183a85488eab094759156fe4a43c875326189 |
|
MD5 | 7b799cbfc52cbe67779939b6a95c37bb |
|
BLAKE2b-256 | 5540667af1ffa3391cb1f1bef415f26e7528a4e0ccb472a01ec016b60ac1ed7d |
Close
Hashes for pyautocorpus-0.1.12-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cd58e2fbcb42ed23762b5f1cd576d683cef027d4c0002836d8255e9a64086415 |
|
MD5 | 085652be7550b4e023cedf7e6566ced2 |
|
BLAKE2b-256 | b6bef4d99c31475256e18a9ddf4b183159d233bc70baa16919692870509f7861 |
Close
Hashes for pyautocorpus-0.1.12-cp38-cp38-macosx_11_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad980f077adb6f29693b6c199a2edd0e1d74f2978ded3e579f96bede75c7a892 |
|
MD5 | 0fc81d26bb140c7e86366ef01ad2c7ab |
|
BLAKE2b-256 | ecd063cd62720624c4ef026e580bc8f3a1428cbfd12446ce342c4537bf4892c5 |
Close
Hashes for pyautocorpus-0.1.12-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 94b3bfe131ca271fe62b6387fff6aaedb2abf2ed240c945b24240337b399e7ed |
|
MD5 | 6b1fbf9542f63a573afd01b0d568b16d |
|
BLAKE2b-256 | 31d41a88ec10f8e22aec6691687b3b6acb95c6525068b6d711573f045cecd4c6 |
Close
Hashes for pyautocorpus-0.1.12-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b4189f84794a1d4d83c9b2ff4a10fa73772d950990d7b888ac4d5ebf813885fd |
|
MD5 | 840d71cadc8d7fc818f1975a0132fd36 |
|
BLAKE2b-256 | 5a658334b2b2902f2f0cb8a009e3c924f9f384f4b99b21d62d26351ef4e8afbe |
Close
Hashes for pyautocorpus-0.1.12-cp37-cp37m-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 71571657bfb273e3bdfa94069f413255e3f58e2e3918af535cee3effd5ddcdfd |
|
MD5 | 4417e82a4f6c4dc053b76d62459cc145 |
|
BLAKE2b-256 | bfc308376d9d12828616367aeb8b4da69232550985be9f0deafdddc4744c0616 |
Close
Hashes for pyautocorpus-0.1.12-cp37-cp37m-macosx_11_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2a2e54197c163ddbece73d2d3131e6c2b87cd5fc7e47c2463ed22632d8239a54 |
|
MD5 | 8c4465976d28ccd12b6561d75a9a4070 |
|
BLAKE2b-256 | fb3b4e4a0f7dd7d52b0f365fb696577a5f5c54f2b8b1725e130117e4a0643045 |