No project description provided
Project description
PyAutoCorpus
A python interface to the excellent AutoCorpus library.
Right now, it only supports the wiki markup textify
function, which strips out
markup. From my benchmarks, this ends up being ~40x faster than methods to strip
markup using other libraries:
mwparserfromhell 0.208 sec/doc
wikitextparser 0.215 sec/doc
pyautocorpus 0.005 sec/doc
where:
mwparserfromhell
ismwparserfromhell.parse(x).strip_code()
wikitextparser
iswikitextparser.parse(x).plain_text()
pyautocorpus
ispyautocorpus.Textifier().textify(x)
Installing
From pypi:
pip install pyautocorpus
From source:
You will first need the pcre
library installed.
python setup.py install
Usage
Example:
import pyautocorpus
textifier = pyautocorpus.Textifier()
textifier.textify("==Wiki Marked up text==\n [[Some Page|link text]] example.")
'Wiki Marked up text\n\n\n link text example.'
Known issues
- Windows is not yet supported
Credits
Contributors to this repository:
- Sean MacAvaney (University of Glasgow)
- Thomas Jänich (University of Glasgow)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyautocorpus-0.1.0.tar.gz
(8.5 kB
view hashes)
Built Distributions
Close
Hashes for pyautocorpus-0.1.0-pp37-pypy37_pp73-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a992873c74e3e7acf2ddf49a3646b2aa8bb24e2841139b5bc4517e3d4424359 |
|
MD5 | c322ce9075927b454025a7e7655ac098 |
|
BLAKE2b-256 | 954a2e1e2098f412442505b125a4eb7c22480861818ddfeef3cddc1025a8920e |
Close
Hashes for pyautocorpus-0.1.0-pp36-pypy36_pp73-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a1d5399ca9c58d144271afa581b8bebdb99ae31087d779d03a3e600a9549ae95 |
|
MD5 | 24a5e8d11e765b4a09466a3dbf124551 |
|
BLAKE2b-256 | 58bb4394b7e77ed211065187fa6da82e7030eaca7f8bca046110c78c28ef7ffe |
Close
Hashes for pyautocorpus-0.1.0-cp39-cp39-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52421e4a5d732212afcf96657fc03dbb545895d948f14942449d7a53f60b88c9 |
|
MD5 | 22af59b1a50e1d3bc6293d3b97d557ee |
|
BLAKE2b-256 | 77f2c6524bfabf249bc35d920130cfadb9c50a35fa9c78c548f0fa1b1ea45eb6 |
Close
Hashes for pyautocorpus-0.1.0-cp39-cp39-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 25d73c5d3d5132df25232a961b78e892adf15709d305bc77cc9de0f05b14102b |
|
MD5 | 8bc165dea260fb98128cabe4fcaa6df2 |
|
BLAKE2b-256 | 1093af6de726a73afde408762af792ec8e585ba8d3dfbbd69dcd02a232dc3da7 |
Close
Hashes for pyautocorpus-0.1.0-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 76f3f2ac853fe27ee4c7b20cd4879e7171eb2a2d5a1103be1fa1559ec27fcf79 |
|
MD5 | 1a46c8905ffffa975745fa11fb538e6c |
|
BLAKE2b-256 | ac4d577b06a3f8d0ae65c94219dabdd5a8507070a2c71ad01749bf6eda6e6406 |
Close
Hashes for pyautocorpus-0.1.0-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d57ba48d6f22d890ef850edfdb7ea1ecb425d62b8a82b179e6a35960d5df224 |
|
MD5 | 9c356a768325ec6e529f14a6bc1b4cb5 |
|
BLAKE2b-256 | 73b03979b678223068d91fb738a8fe27df579f7d3ab75f66946cf3841d797073 |
Close
Hashes for pyautocorpus-0.1.0-cp38-cp38-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6566ef7d72b2bcfca1cafd865216b5d32f6c55f09eade567db2041aaae9bd2b5 |
|
MD5 | f9e5dc7d892c98f0e9fbd8c296b1fc6b |
|
BLAKE2b-256 | 9a2c1f70f90b99589e10629fe3e674ead9f82c76c1c129f8622eee043011bc20 |
Close
Hashes for pyautocorpus-0.1.0-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b27242a01347ca581dca7f3f5b9e35658d18ab7134e7a9976c191c87453a2700 |
|
MD5 | 42c65e2623c2fdb739520805e693c818 |
|
BLAKE2b-256 | 94cea28e8e436febda77c95ec15f971e0f14b941811caa915e48edf426c3873b |
Close
Hashes for pyautocorpus-0.1.0-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 710e69146dc2d584de2b58403db33b8ba4efd25b861ba547295f80cafc4847bb |
|
MD5 | 07296c13ab1935329c742b1739e6ad7d |
|
BLAKE2b-256 | 398ab699aefa7df90e34895fc2e015467b9417e061bb941d31f9202938256ff1 |
Close
Hashes for pyautocorpus-0.1.0-cp37-cp37m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 12e1b0f439bc209901a4554b75aeb8529b5113170393719acfdbdad033b785c9 |
|
MD5 | 58cd2c422c0fe5aecc3f2bccb403f4ae |
|
BLAKE2b-256 | cfb0d46cc00395e16070e047b3eb56ae4ad0955be52f640998fc619fb4acdd4b |
Close
Hashes for pyautocorpus-0.1.0-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b0bcef8a1a9dd95c4b2847097175626ccca74b1be39f070b4e58721ee34fbd72 |
|
MD5 | cb8929841a44b880564adfe36709cfa4 |
|
BLAKE2b-256 | 6d1e8c56e626f8096a75036f5626350c338474b94d9f24308f51706af0d0ef5f |
Close
Hashes for pyautocorpus-0.1.0-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2cfb3e3883f7fc595e8a0a64903cb5cf9aa2cf0aff02d76a79bdaedd6166c95b |
|
MD5 | 9479d6b27ebf10f594d5b8c7a3a48278 |
|
BLAKE2b-256 | 012348d8f8a936c3776b4f8772a1b9417989b7e108cdf90e42e27eac5ef7848e |
Close
Hashes for pyautocorpus-0.1.0-cp36-cp36m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f2ba6c2586a2a88031cae69912cb16e6d8b7a7a6daef22bcf4d975d0db476a9 |
|
MD5 | 0c684a38101c9f48c4c44640fdd113b4 |
|
BLAKE2b-256 | 46cada310a539014cee529dade6ca1ae12f96e3603363b838a14fda09ffc7c6a |
Close
Hashes for pyautocorpus-0.1.0-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 858871f465f058ece08230a937c63b5d4845148e5487e3493d024fc58a2de353 |
|
MD5 | b69ce1be10802831403b622701e459df |
|
BLAKE2b-256 | c94b252a609288be61bc9fcdf8f75fd8e2764a35089a6c88419ce3b31caf1637 |