No project description provided
Project description
PyAutoCorpus
A python interface to the excellent AutoCorpus library.
Right now, it only supports the wiki markup textify
function, which strips out
markup. From my benchmarks, this ends up being ~40x faster than methods to strip
markup using other libraries:
mwparserfromhell 0.208 sec/doc
wikitextparser 0.215 sec/doc
pyautocorpus 0.005 sec/doc
where:
mwparserfromhell
ismwparserfromhell.parse(x).strip_code()
wikitextparser
iswikitextparser.parse(x).plain_text()
pyautocorpus
ispyautocorpus.Textifier().textify(x)
Installing
From pypi:
pip install pyautocorpus
From source:
Be sure to clone recursively:
git clone --recursive https://github.com/seanmacavaney/pyautocorpus.git
You will first need the pcre
library installed.
python setup.py install
Usage
Example:
import pyautocorpus
textifier = pyautocorpus.Textifier()
textifier.textify("==Wiki Marked up text==\n [[Some Page|link text]] example.")
'Wiki Marked up text\n\n\n link text example.'
Known issues
- Windows is not yet supported
Credits
Contributors to this repository:
- Sean MacAvaney (University of Glasgow)
- Thomas Jänich (University of Glasgow)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyautocorpus-0.1.8.tar.gz
(10.8 kB
view hashes)
Built Distributions
Close
Hashes for pyautocorpus-0.1.8-pp37-pypy37_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | efc5d0792727325d67f90f5e08168ad801cc2798b5bdae8249408d3673997bc7 |
|
MD5 | bcce7f27cf44c340c53813eefcff6635 |
|
BLAKE2b-256 | bd6c2f23ffbd1cd2327e14093fb116616dc0479afec436d8bfe5592070059e2f |
Close
Hashes for pyautocorpus-0.1.8-pp37-pypy37_pp73-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb649f419d512473977ea185dae2288dd5ad4b2b37b83be32a0670d9671c7a03 |
|
MD5 | 7eb4de35a81eaab6f0083ab2f1122aeb |
|
BLAKE2b-256 | d25082fe1bf3f43bcdba1d89585e46fe5b16e2a192dab8c73b6c9d0e66827265 |
Close
Hashes for pyautocorpus-0.1.8-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4e6faf42b928f46905f2d20cc64e569cc5570072288a8aa4fbcdf1f503af30e8 |
|
MD5 | 25106d257e069059cdd25ef9c82d6e54 |
|
BLAKE2b-256 | 74df90480a98202ee31373a4859cd57470331ee849c122f128fc16f287e0b465 |
Close
Hashes for pyautocorpus-0.1.8-cp310-cp310-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf7d10134d8162ac2484548cd12a240182943063615c8f3943983c2dfbe4b1c1 |
|
MD5 | 4929c71e9b2dc1e9a7eca3e3d4fcf87b |
|
BLAKE2b-256 | 9b4d13cc71f70fafe9010ee2c94422440b05089604468435f9e549edbe7ee7d4 |
Close
Hashes for pyautocorpus-0.1.8-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | abb979f04015f9ce1eb2e04fe30ba2a48bcf0eadf6d6d7f660aaec1f223a555d |
|
MD5 | 573e1db73e07bf656b111e0c993059ac |
|
BLAKE2b-256 | db27b2d5ce1377906d128079aaae6845f05e596b46dedf7e0013ed210555ced3 |
Close
Hashes for pyautocorpus-0.1.8-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66a85ce89f7670c428159385bc8a51d57bb23f61d2fe67fec4d989c09222c26d |
|
MD5 | 7ced653106ebe8dbc79c86977cef2433 |
|
BLAKE2b-256 | b894a13f339a3215fbde82c13d32cc798d49637001b8827a24e6c8bd0b7c0c01 |
Close
Hashes for pyautocorpus-0.1.8-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d6dbdcc9c2e7b2a94919a33969b17cd8ec9a1b58283a05fafa964f73c1ffd6e |
|
MD5 | 2eca00a29248ae01c557d71794cd8e87 |
|
BLAKE2b-256 | 7ce158e15000400352cfb8b614847457dd4cce4440dc8f1cb0d25443d13b0f8d |
Close
Hashes for pyautocorpus-0.1.8-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6432a32035abeeb89e52ddcec93bd177b1a5fc2ce1fedd2a6301631a9d1ae9ab |
|
MD5 | d6de25b25bc2963e5bb4472bb35833f2 |
|
BLAKE2b-256 | 1064c943c230c3453026bfb576ecaf905a6b2786933af8d0d02f4dd2278c6162 |
Close
Hashes for pyautocorpus-0.1.8-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 782df943cd081dcb11545b9cde52ee4999644015eada078a0e4a795677fe1b0f |
|
MD5 | 93e80ef7c694b7ce2b017782af0ece6a |
|
BLAKE2b-256 | b4e99243f38c549b695a46992e9c9478aa2a295c49ae11797805dab687d1d23d |
Close
Hashes for pyautocorpus-0.1.8-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54d2fb5063742d0acf1efad37138a3594b234e5f494630e4c41a59eb9ad2b493 |
|
MD5 | 0f9f162b634d828fba58add2d0d6c7ac |
|
BLAKE2b-256 | 4bdfd748d0655a1424f5815aeea62904f810219c04caf9982f948ea86eadbd63 |
Close
Hashes for pyautocorpus-0.1.8-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ca3bd524600ac0f9d23a6228cb60920db01680ce37743fdd7d53f057766bf11 |
|
MD5 | 38ff8f78f1b1e0b9c252bb389fa036c4 |
|
BLAKE2b-256 | 55a682df45d52da8f5533e80a15495230094be4c9b379b9cdc77a1b02dd56f0a |
Close
Hashes for pyautocorpus-0.1.8-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0e6258e4168540b200f8fbdde6a0a33d49fa9df58f1d49205cfbb5954ccc71d9 |
|
MD5 | caa8134a6e99d17d536cd4ad9bb60692 |
|
BLAKE2b-256 | 2d5179ec57f44f1b7f36e2a474db5542875357dec9682f7c9e4f4fc58b968d19 |
Close
Hashes for pyautocorpus-0.1.8-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe20717b84c376a47a51491464df5d456b8406da283100264f502e1a56fb056c |
|
MD5 | 03af0efc6e9d2a8b178f7224e3f37525 |
|
BLAKE2b-256 | 6a2aeed59b3221588da105ad0ba886ffb36ae1b639982c2981c4b48f65dfd449 |
Close
Hashes for pyautocorpus-0.1.8-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f9af9fe359da6d8b40362264e1a641540753f162bfe7dce386358b165012cdba |
|
MD5 | bcbd7b9554c00e0ab3fbda8fe7d309d0 |
|
BLAKE2b-256 | ecc37d2deb773392b022111a9df525c1a15475f08a5b1e94c0b1fe907342a4b3 |
Close
Hashes for pyautocorpus-0.1.8-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c1be01f5051e9c51c7e92bb50ff7cd1a63f0139cf3505aded048a2eecd29b3e |
|
MD5 | 54fe0c2aadd8607da84b53d751eb9da2 |
|
BLAKE2b-256 | e3e9d7db908716bfa39b1492a7b7fbc2457614d1d37fc0b815a42babce2ea6ed |
Close
Hashes for pyautocorpus-0.1.8-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c5549ab1b9df5f93cc2038a569acc78bc8c7155559239b8ed2eab990dbaf68fc |
|
MD5 | 4b0f093db2a8b53e730e1b51288adcc5 |
|
BLAKE2b-256 | 62b2b42aa0bfd0c863bb892e6edc49cc7d68ab4d2e3b7f113f573dfe067a1020 |