No project description provided
Project description
PyAutoCorpus
A python interface to the excellent AutoCorpus library.
Right now, it only supports the wiki markup textify
function, which strips out
markup. From my benchmarks, this ends up being ~40x faster than methods to strip
markup using other libraries:
mwparserfromhell 0.208 sec/doc
wikitextparser 0.215 sec/doc
pyautocorpus 0.005 sec/doc
where:
mwparserfromhell
ismwparserfromhell.parse(x).strip_code()
wikitextparser
iswikitextparser.parse(x).plain_text()
pyautocorpus
ispyautocorpus.Textifier().textify(x)
Installing
From pypi:
pip install pyautocorpus
From source:
Be sure to clone recursively:
git clone --recursive https://github.com/seanmacavaney/pyautocorpus.git
You will first need the pcre
library installed.
python setup.py install
Usage
Example:
import pyautocorpus
textifier = pyautocorpus.Textifier()
textifier.textify("==Wiki Marked up text==\n [[Some Page|link text]] example.")
'Wiki Marked up text\n\n\n link text example.'
Known issues
- Windows is not yet supported
Credits
Contributors to this repository:
- Sean MacAvaney (University of Glasgow)
- Thomas Jänich (University of Glasgow)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyautocorpus-0.1.10.tar.gz
(10.8 kB
view hashes)
Built Distributions
Close
Hashes for pyautocorpus-0.1.10-pp37-pypy37_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c50b6a47f9bcc9f581d6f8b1689828facc7f28c04a2b1f7b1e7f27f01c737431 |
|
MD5 | cda2deb9b3a106c6cf8b52eb0529f77e |
|
BLAKE2b-256 | d73c55765e34b18d9d4b4360a487e2435ddbbca3f908d2325a3b8715b368a742 |
Close
Hashes for pyautocorpus-0.1.10-pp37-pypy37_pp73-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2fec38e060a5589a42a097a148d4d3295641b4c1b86d654f2f81e5402352cb47 |
|
MD5 | 25b8959173c3ca05dde2ef2b7ef680a0 |
|
BLAKE2b-256 | cdf55f9f4c78dec248ab035bbed36550529c206b5f6907eb57638082fa6c5234 |
Close
Hashes for pyautocorpus-0.1.10-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b08fdd7fe3979ec785ceee3de55795d6a6df3d999c2c15f4bdf35958ad80c8f7 |
|
MD5 | 813073d60ab4cb484431523de249ed6d |
|
BLAKE2b-256 | dd2eeb3739c30802c3b2c9661caf9a0bdb1204e2124bf733e89b3a2719492a08 |
Close
Hashes for pyautocorpus-0.1.10-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c31176580053fcce8d1461ea07b00d82b868c22cbe7186e3f9efec61e37ab17 |
|
MD5 | 52d97338c5d4cb03baf9e824043b4709 |
|
BLAKE2b-256 | 73f4238b152873efc22a487785d50464a18094996d35ba284f251ac56fa7e820 |
Close
Hashes for pyautocorpus-0.1.10-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf051764975c2f2721145bf636e2e5adbb30d858e80a1d7a1bda9f1bba403014 |
|
MD5 | a5d75a2fbe9ae31752d975136a1d7f10 |
|
BLAKE2b-256 | 06cdabeb3b42cff1f714243a09e0d2274bf24c829f0ad1741dbc7eed7db58f1b |
Close
Hashes for pyautocorpus-0.1.10-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c85697c057975f63979fa89137a1ac391c2ad6469af52c540d68bc0bf7aa80b |
|
MD5 | fcfbccdba8168ddf1f1f26791f86b6d8 |
|
BLAKE2b-256 | 7715ec8358140af388bfd05541cc08b8eed7e44e28bd40a1e3bde01aaaf7062f |
Close
Hashes for pyautocorpus-0.1.10-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 86fe922bc39deb36589be0f431c382478ff9cce424aa7b0b65c0231895d08e66 |
|
MD5 | b7f746f002882cc9261eb93d5d6f1ecc |
|
BLAKE2b-256 | 72e80292b767efb6127d9b6a1072a841bbbec3c8833144298ffcbfe033a4bfde |
Close
Hashes for pyautocorpus-0.1.10-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 705a75473183c02ceab45ae97c1eb872b21ab4410ce686aab80e7c555b91ae58 |
|
MD5 | 181389678befd64521d77065cebe92c6 |
|
BLAKE2b-256 | 3fb417ccbad0f92c9299f528debd843edcfa3e0581be93e3e38fe3d35dfae300 |
Close
Hashes for pyautocorpus-0.1.10-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7d6a6d8bcceb4127a7e1ae5d767f48d962210636c5aa3f0001a2edf49b801fbb |
|
MD5 | 1c57df91ca1ff9388b5a31aa2e019311 |
|
BLAKE2b-256 | fada5bb4357a3550e89ae3718edebc65486443f316827a8b384abe2577d4d23f |
Close
Hashes for pyautocorpus-0.1.10-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b4ea3c62c386213d57069c249ad353a70f76d58c08e190cdf06bb9424a3be247 |
|
MD5 | 8d09bd100ae23f4b29b70b3234ba43b1 |
|
BLAKE2b-256 | d638478e213203d0e0295fcd1529ed6c47e5b9d688324d496145880350e9d48c |
Close
Hashes for pyautocorpus-0.1.10-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1cd3c296f5197bb9960a15c73a6d61cdaefc6001d5bd88d677694717f07c120 |
|
MD5 | eff965a7c4bb82ad4858a89de3243150 |
|
BLAKE2b-256 | 06d7d830a4277f0cf67974f1c41d6566aa0f0510df20dd29169c69b7054a2162 |
Close
Hashes for pyautocorpus-0.1.10-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f592085b2c300e69674b9726347d09645528d9e3e7db6e36ae5a80260e632f5f |
|
MD5 | 349ecb7056e56051b79198bbdc06f034 |
|
BLAKE2b-256 | 925542d8e1e45076f93e02ea3b7752d32e25fb94d6e92d7a32906e84801276ac |
Close
Hashes for pyautocorpus-0.1.10-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | acaf2084e46da771c987b002b3e0c7067626c9fb185c22a140195db2dd3a6fd6 |
|
MD5 | fa2642f5db858a9c13884606a8edd686 |
|
BLAKE2b-256 | 8865043ba90bf2fb707ccd97384c9f465f9e8a7a1a757db779561a16fd7e8cd1 |