No project description provided
Project description
PyAutoCorpus
A python interface to the excellent AutoCorpus library.
Right now, it only supports the wiki markup textify
function, which strips out
markup. From my benchmarks, this ends up being ~40x faster than methods to strip
markup using other libraries:
mwparserfromhell 0.208 sec/doc
wikitextparser 0.215 sec/doc
pyautocorpus 0.005 sec/doc
where:
mwparserfromhell
ismwparserfromhell.parse(x).strip_code()
wikitextparser
iswikitextparser.parse(x).plain_text()
pyautocorpus
ispyautocorpus.Textifier().textify(x)
Installing
From pypi:
pip install pyautocorpus
From source:
You will first need the pcre
library installed.
python setup.py install
Usage
Example:
import pyautocorpus
textifier = pyautocorpus.Textifier()
textifier.textify("==Wiki Marked up text==\n [[Some Page|link text]] example.")
'Wiki Marked up text\n\n\n link text example.'
Known issues
- Windows is not yet supported
Credits
Contributors to this repository:
- Sean MacAvaney (University of Glasgow)
- Thomas Jänich (University of Glasgow)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyautocorpus-0.1.1.tar.gz
(8.5 kB
view hashes)
Built Distributions
Close
Hashes for pyautocorpus-0.1.1-pp37-pypy37_pp73-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f21815774964dc3f7e507503ee3f7d18b1e3fd1c41dca13e92f6bda10039e1ef |
|
MD5 | ee29e955fad77899eeb945a09ed7ad57 |
|
BLAKE2b-256 | 86af0163a964b58765f493318ec338134e0bb0f4340c8e9d326b5743921764ac |
Close
Hashes for pyautocorpus-0.1.1-pp36-pypy36_pp73-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a6378827b71e3dfafd6b3e55a42d76e9f89db2fe8b327b2780e4cf7d0d333e8 |
|
MD5 | 2658cecce80799cb1e338fc85018e742 |
|
BLAKE2b-256 | c3c5fd8c230bd64bc4d531ce8ab825755e5f9627eb9619addfb9b6e61aa6afa9 |
Close
Hashes for pyautocorpus-0.1.1-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 550c0f011b27e34bc69ca4450076ddc7d595ab2fd20bebc58e42ff27fb8e88c5 |
|
MD5 | e789b7a475b9ae894b2ed310bc4d7040 |
|
BLAKE2b-256 | 94b5518bd29fa97d53dbf464e5c0f97892ed2b4121d9ae52fc7c6b52ae296657 |
Close
Hashes for pyautocorpus-0.1.1-cp39-cp39-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d4732ab33ec707201d1ca0995187237e95208dd1b0dcdcb0ae52ccc64336108 |
|
MD5 | 025ff41fb0024df03755bfacf775a9ba |
|
BLAKE2b-256 | 58a2128461fc6227147f05bce35f7c391453bfd3a82838d2f2b29c16a880adb3 |
Close
Hashes for pyautocorpus-0.1.1-cp39-cp39-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc88a8e93afa76193a98aed33c673af94aaf8af683897f0aeae88dbf8fa327cf |
|
MD5 | 3600051c63d378755c69ce5d91d3f6a6 |
|
BLAKE2b-256 | 302c92caded0068006df0eed721ff3637791a5ba35037d3840561c149f7ce13a |
Close
Hashes for pyautocorpus-0.1.1-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c504e491dc48b0a88743d50f77f57a1d8c52f993e93770a74f630f8a3e0006c4 |
|
MD5 | bf0cbe32e93abdef6f7a3ed044a11dac |
|
BLAKE2b-256 | 0f91a757c4f347d02f1855eb1c0cf1ddc04a68f2d5dbfa7cd1de183d35e7cad5 |
Close
Hashes for pyautocorpus-0.1.1-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 919514994d74ae4c71097514ce81a6c7a35b268b3237f90214d90981c7c74d03 |
|
MD5 | e436a4776ff1689948f6cdd6117c53e2 |
|
BLAKE2b-256 | 5a801f45537b475f7ab3bf4d4a4292a71d974231d2a13cf6e7679ef74361da58 |
Close
Hashes for pyautocorpus-0.1.1-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 155b581dcf7580a0f395ed3ebfb126bf5b457324c8107c40c5c245bcd88b0261 |
|
MD5 | f04711ae978108bc9bc2337c2b7bc305 |
|
BLAKE2b-256 | ee2dea14e91981c85f918c040b0688f6e2804bb20c022d388ae0f40c529103b5 |
Close
Hashes for pyautocorpus-0.1.1-cp38-cp38-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9672387018d337763d75a70f0e878a273cccca4a85ab5f48e9593f472d66830c |
|
MD5 | 747f5466861e13fe015a48d2e3e7e610 |
|
BLAKE2b-256 | b076c1fdebab52b653d24c89e162da5ff546ae016b4e1b8d602970a58e279a17 |
Close
Hashes for pyautocorpus-0.1.1-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | db99924b1d4f80166ff5df1edb1199981e6f598799b13ed916ca33198187a196 |
|
MD5 | c2d073dfec4b6e2ec16e88ff053fc5f1 |
|
BLAKE2b-256 | bd92a96fbee257b4db8b369a8cdf91a5b5ce547bdcc32c4ddb99b128c063e4f2 |
Close
Hashes for pyautocorpus-0.1.1-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c7639bdc9bcfc3285be18720c77ca11efc27aceea34cf73eb13b570a826a0a03 |
|
MD5 | b74c3b242be524d257944b4168d9edd3 |
|
BLAKE2b-256 | 02abbef8dceb14426fac18cfc5663ac1e1a1c366078378ede0ae5426c5e0ca9b |
Close
Hashes for pyautocorpus-0.1.1-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3051ffcdbd2892bcd89f988f9f96af08ce2fba643109b5c390c3feefce383585 |
|
MD5 | ffcd8c432401d36406598ad20cdcfd26 |
|
BLAKE2b-256 | f47f73e67532e75b3940d43d82068228d5f84c47babbbc164acd555eeffdad56 |
Close
Hashes for pyautocorpus-0.1.1-cp37-cp37m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 693f16a584d215200702c23472a98b0fa0057f037afb62eaf6f7a5d445042335 |
|
MD5 | 1774b821f7513757dee725222e35abdb |
|
BLAKE2b-256 | 89d280191b74441dc6ada973cb87b4ee2e2dcbe5f7f99e2c56f5f73a5bddabca |
Close
Hashes for pyautocorpus-0.1.1-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2f08294d049eb1265031ac4c408370ee57b7d0eac992d3afbff365d12161c7b8 |
|
MD5 | 4567a7302864cee9bfb7728421bbc0a0 |
|
BLAKE2b-256 | 0a3fedf0fb6c82b391abf7f516f35dea8f28237b04e0d9647f9f217709684083 |
Close
Hashes for pyautocorpus-0.1.1-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 65304a2d58fc75c23b63f8063fc762a4f1557bb47f5f4486eba95bc78f8a4369 |
|
MD5 | 43706375ac19e0617428fd7096ff45a9 |
|
BLAKE2b-256 | 8af33a328546d2dbbcc0b8ecf8e7abdfe8d80bc89fc603c1e6ca6be6e0d6ded6 |
Close
Hashes for pyautocorpus-0.1.1-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 96b79144f565e43d7508c3042174c5369a00254a162b917c425670075b960c2f |
|
MD5 | 22931e463f613fa251066d02a19706e9 |
|
BLAKE2b-256 | 39d1f6b76b4ae07cf51567cbb1f53f3117a44f1ee9688081f0fad3dd9d77de51 |
Close
Hashes for pyautocorpus-0.1.1-cp36-cp36m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6b578d09e8bedd886584604bb434ae7dce8e8871203a3a62994eebc7ef4e0bb1 |
|
MD5 | 0af4882baab8cf94b89b22dd1323204d |
|
BLAKE2b-256 | a7a5850faade858bfcae7b95b38b4f1913697dea955aba95eccc6be49fee16ab |
Close
Hashes for pyautocorpus-0.1.1-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9f4273195f7abbbbc4c0faf6b55f12db513cf0da00208f4f495d53c1c98a4862 |
|
MD5 | db8b3ef249325bdbf8af9219c3368404 |
|
BLAKE2b-256 | 9dca42fef8467707ec462eba8aba93138cf2dd1cff9be5d4f7d5be1e10f008d7 |